
METHODS FOR MAKING POLYNUCLEOTIDES 
AND PURIFYING DOUBLE -STRANDED POLYNUCLEOTIDES 

TECHNICAL FIELD 
The present invention is generally directed to the fields of genetic and protein 

5 engineering and molecular biology. In particular, the invention provides methods for 

identifying and purifying double-stranded polynucleotides lacking base pair mismatches, 
insertion/deletion loops and nucleotide gaps. 

The present invention is generally directed to the fields of protein and genetic 
engineering and molecular biology. In one aspect, the invention is directed to libraries of 

10 oligonucleotides and methods for generating any nucleic acid sequence, including synthetic 
genes, antisense constructs and polypeptide coding sequences. In one aspect, the libraries of 
the invention comprise oligonucleotides comprising restriction endonuclease restriction sites, 
e.g., Type-HS restriction endonuclease restriction sites, wherein the restriction endonuclease 
cuts at a fixed position outside of the recognition sequence to generate a single stranded 

15 overhang. The polynucleotide construction methods comprise use of libraries of pre-made 
multicodon (e.g., dicodon) oligonucleotide building blocks and Type-IIS restriction 
endonucleases. 

In one aspect, the invention is directed to methods for generating sets, or 
libraries, of nucleic acids encoding chimeric antigen binding molecules, including, e.g., 

20 antibodies and related molecules, such as antigen binding sites and domains and other 

antigen binding fragments, including single and double stranded antibodies. This invention 
provides methods for generating new or variant chimeric antigen binding polypeptides, e.g., 
antigen binding sites, antibodies and specific domains or fragments of antibodies (e.g., Fab or 
Fc domains) by altering the nucleic acids that encode them by, e.g., saturation mutagenesis, 

25 an optimized directed evolution system, synthetic ligation reassembly, or a combination 
thereof. 

The invention also provides libraries of chimeric antigen binding polypeptides 
encoded by the nucleic acid libraries of the invention and generated by the methods of the 
invention. These antigen binding polypeptides can be analyzed using any liquid or solid state 
30 screening method, e.g., phage display, ribosome display, using capillary array platforms, and 
the like. The polypeptides generated by the methods of the invention can be used in vitro, 
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e.g., to isolate. or identify antigens or in vivo, e.g., to treat or diagnose various diseases and 
conditions, to modulate, stimulate or attenuate an immune response. The invention also is 
directed to the generation of chimeric immunoglobulins for administering passive immunity 
and nucleic acids encoding these chimeric antigen binding molecules for genetic vaccines. 

5 BACKGROUND 

Synthetic oligonucleotides are commonly used to construct nucleic acids, 
including polypeptide coding sequences and gene constructs. However, even the best 
oligonucleotide synthesizer has a 1% to 5% error rate. These errors can result in improper 
base pair sequences, which can lead to generation of an erroneous protein sequences. These 

10 errors can also result in sequences that cannot be properly transcribed or untranslated, 

including, e.g., premature stop codons. To detect these errors, the oligonucleotides or the 
sequences generated using the oligonucleotides are sequenced. However, sequencing to 
detect errors in nucleic acid synthetic techniques is time consuming and expensive. 

Engineering genes, polypeptide coding sequences and other polynucleotide 

15 molecules can be impeded by the need to isolate, synthesize or handle a parental, or template, 
DNA sequence. For example, it may be necessary to alter codon usage for optimal 
expression in a cell host, requiring manipulation of the polynucleotide sequence. Frequently 
is it desirable or necessary to add and/or remove restriction sites to an isolated, cloned or 
amplified polynucleotide to facilitate manipulation of the sequence, requiring further 

20 modification of the molecule. All of these manipulations introduce labor costs and are 
potential sources of sequence and cloning errors. 

The best quality oligonucleotide synthesis systems available still contain up to 
1% of (n-1) and (n-2) contaminations leading to a high error rate in the nucleic acid 
sequences (e.g. genes, gene pathways, or regulatory motifs) built. These errors can manifest 

25 themselves as frame shifts or as stop codon, resulting in truncated proteins if the engineered 
gene is expressed. Sometimes, more than 20 clones have to be sequenced and errors 
corrected (e.g., by site directed mutagenesis) to get the desired nucleotide sequence for a 
single gene or coding sequence. In the case of chimeric polynucleotide libraries sequencing 
and correcting all errors is not an option and oligo-based sequence errors decrease cloning 

30 and screening efficiency significantly. 
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Antigen binding polypeptides, such as antibodies, are increasingly used in a 
variety of therapeutic applications. For example, in immunotherapy, antibodies are used to 
directly kill target cells, such as cancer cells. They can be administered to generate passive 
immunity. Antigen binding polypeptides are also used as carriers to deliver cytotoxic or 
imaging reagents. Monoclonal antibodies (mAbs) approved for cancer therapy are now in 
Phase II and HI trials. Certain anti-idiotypic antibodies that bind to the antigen-combining 
sites of antibodies can effectively mimic the three-dimensional structures and functions of the 
external antigens and can be used as surrogate antigens for active specific immunotherapy. 
Bi-specific antibodies combine immune cell activation with tumor cell recognition; thus, 
tumor cells or cells expressing tumor specific antigens (e.g., tumor vasculature) are killed by 
pre-defined effector cells. Antibodies can be administered to increase or decrease the levels 
of cytokines or hormones by direct binding or by stimulating or inhibiting secretory cells. 
Accordingly, increasing the affinity or avidity of an antibody to a desired antigen, such as a 
cancer-specific antigen, would result in greater specificity of the antibody to its target, 
resulting in a variety of therapeutic benefits, such as needing to administer less antibody- 
containing pharmaceutical. 

SUMMARY 

METHODS FOR PURIFYING AND IDENTIFYING DOUBLE-STRANDED NUCLEIC 
ACIDS LACKING BASE PAIR MISMATCHES, INSERTION/DELETION LOOPS OR 
NUCLEOTIDE GAPS 

The invention provides methods for identifying and purifying double-stranded 

polynucleotides lacking nucleotide gaps, base pair mismatches and insertion/deletion loops. 

In one aspect, the invention provides methods for purifying double-stranded polynucleotides 

lacking base pair mismatches, insertion/deletion loops and/or nucleotide gaps comprising the 

following steps: (a) providing a plurality of polypeptides that specifically bind to a base pair 

mismatch, an insertion/deletion loop and/or a nucleotide gap or gaps within a double stranded 

polynucleotide; (b) providing a sample comprising a plurality of double-stranded 

polynucleotides; (c) contacting the double-stranded polynucleotides of step (b) with the 

polypeptides of step (a) under conditions wherein a polypeptide of step (a) can specifically 

bind to a base pair mismatch, an insertion/deletion loop and/or a nucleotide gap or gaps in a 

double stranded polynucleotide of step (b); and (d) separating the double-stranded 
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polynucleotides lacking a specifically bound polypeptide of step (a) from the double-stranded 
polynucleotides to which a polypeptide of step (a) has specifically bound, thereby purifying 
double-stranded polynucleotides lacking base pair mismatches, insertion/deletion loops 
and/or nucleotide gaps. In one aspect, the double-stranded polynucleotide comprises a 

5 double-stranded oligonucleotide. In one aspect, the double-stranded polynucleotide consists 
of a double-stranded oligonucleotide. 

In alternative aspects, the double-stranded polynucleotide is between about 3 
and about 300 base pairs in length; between 10 and about 200 base pairs in length; and, 
between 50 and about 150 base pairs in length. In alternative aspects, the gaps in the double- 

10 stranded polynucleotide are between about 1 and 30, about 2 and 20, about 3 and 15, about 4 
and 12 and about 5 and 10 nucleotides in length. 

In alternative aspects, the the base pair mismatch comprises a C:T mismatch, a 
G:A mismatch, a C:A mismatch or a G:U/T mismatch. 

In one aspect, the polypeptide that specifically binds to a base pair mismatch, 

15 an insertion/deletion loop and/or a nucleotide gap or gaps in a double stranded 

polynucleotide comprises a DNA repair enzyme. In alternative aspects, the DNA repair 
enzyme is a bacterial DNA repair enzyme, a MutS DNA repair enzyme, a Taq MutS DNA 
repair enzyme, an Fpg DNA repair enzyme, a MutY DNA repair enzyme, a hexA DNA 
mismatch repair enzyme, a Vsr mismatch repair enzyme, a mammalian DNA repair enzyme 

20 and natural or synthetic variations and isozymes thereof. In one aspect, the DNA repair 

enzyme is a DNA glycosylase that initiates base-excision repair of G:U/T mismatches. The 
DNA glycosylase can comprise a bacterial mismatch-specific uracil-DNA glycosylase 
(MUG) DNA repair enzyme or a eukaryotic thymine-DNA glycosylase (TDG) enzyme. 

In one aspect, the separating of the double-stranded polynucleotides lacking a 

25 specifically bound polypeptide of step (a) from the double-stranded polynucleotides to which 
a polypeptide of step (a) has specifically bound of step (d) comprises use of an 
immunoaffinity column, wherein the column comprises immobilized antibodies capable of 
specifically binding to the specifically bound polypeptide or an epitope bound to the 
specifically bound polypeptide, and the sample is passed through the immunoaffinity column 

30 under conditions wherein the immobilized antibodies are capable of specifically binding to 
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the specifically bound polypeptide or the epitope bound to the specifically bound 
polypeptide. 

In one aspect, the separating of the double-stranded polynucleotides lacking a 
specifically bound polypeptide of step (a) from the double-stranded polynucleotides to which 
5 a polypeptide of step (a) has specifically bound of step (d) comprises use of an antibody, 

wherein the antibody is capable of specifically binding to the specifically bound polypeptide 
or an epitope bound to the specifically bound polypeptide and the antibody is contacted with 
the specifically bound polypeptide under conditions wherein the antibodies are capable of 
specifically binding to the specifically bound polypeptide or an epitope bound to the 

10 specifically bound polypeptide. The antibody can be an immobilized antibody. The 

antibody can be immobilized onto a bead or a magnetized particle or a magnetized bead. 

In one aspect, the separating of the double-stranded polynucleotides lacking a 
specifically bound polypeptide of step (a) from the double-stranded polynucleotides to which 
a polypeptide of step (a) has specifically bound of step (d) comprises use of an affinity 

15 column, wherein the column comprises immobilized binding molecules capable of 

specifically binding to a tag linked to the specifically bound polypeptide and the sample is 
passed through the affinity column under conditions wherein the immobilized antibodies are 
capable of specifically binding to the tag linked to the specifically bound polypeptide. The 
immobilized binding molecules can comprise an avidin or a natural or synthetic variation or 

20 homologue thereof and the tag linked to the specifically bound polypeptide can comprise a 
biotin or a natural or synthetic variation or homologue thereof. 

In one aspect, the separating of the double-stranded polynucleotides lacking a 
specifically bound polypeptide of step (a) from the double-stranded polynucleotides to which 
a polypeptide of step (a) has specifically bound of step (d) comprises use .of a size exclusion 

25 column, such as a spin column. Alternatively, the separating can comprise use of a size 
exclusion gel, such as an agarose gel. 

In one aspect, the double-stranded polynucleotide comprises a polypeptide 
coding sequence. The polypeptide coding sequence can comprise a fusion protein coding 
sequence. The fusion protein can comprise a polypeptide of interest upstream of an intein, 

30 wherein the intein comprises a polypeptide. The intein polypeptide can comprise an enzyme, 
such as one used to identify vector or insert positive clones, such as Lac Z. The intein 
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polypeptide can comprise an antibody or a ligand. In one aspect, the intein polypeptide 
comprises a polypeptide selectable marker, such as an antibiotic. The antibiotic can 
comprise a kanamycin, a penicillin or a hygromycin. 

The invention provides a method for assembling double-stranded 
oligonucleotides to generate a polynucleotide lacking base pair mismatches, 
insertion/deletion loops and/or nucleotide gaps comprising the following steps: (a) 
providing a plurality of polypeptides that specifically bind to a base pair mismatch, an 
insertion/deletion loop and/or a nucleotide gap or gaps in a double stranded polynucleotide; 
(b) providing a sample comprising a plurality of double-stranded oligonucleotides; (c) 
contacting the double-stranded oligonucleotides of step (b) with the polypeptides of step (a) 
under conditions wherein a polypeptide of step (a) can specifically bind to a base pair 
mismatch, an insertion/deletion loop and/or a nucleotide gap or gaps in a double stranded 
oligonucleotide of step (b); (d) separating the double-stranded oligonucleotides lacking a 
specifically bound polypeptide of step (a) from the double-stranded oligonucleotides to 
which a polypeptide of step (a) has specifically bound, thereby purifying double-stranded 
oligonucleotides lacking base pair mismatches, insertion/deletion loops and/or a nucleotide 
gap or gaps; and (e) joining together the purified double-stranded oligonucleotides lacking 
base pair mismatches and insertion/deletion loops, thereby generating a polynucleotide 
lacking base pair mismatches, insertion/deletion loops and/or nucleotide gaps. 

In one aspect, the double-stranded oligonucleotides comprise libraries of 
oligonucleotides, e.g., the libraries of the invention comprising oligonucleotides comprising 
multicodons. For example, the double-stranded oligonucleotides can comprise libraries of 
oligonucleotides comprising multicodon, e.g., dicodon, building blocks. In one aspect, the 
library comprises a plurality of double-stranded oligonucleotide members, wherein each 
oligonucleotide member comprises two or more codons in tandem (e.g., a dicodon) and a 
Type-IIS restriction endonuclease recognition sequence flanking the 5' and the 3' end of the 
multicodon (e.g., dicodon, tricodon, tetracodon, and the like). 

The invention provides a method for generating a polynucleotide lacking base 
pair mismatches, insertion/deletion loops and/or nucleotide gaps comprising the following 
steps: (a) providing a plurality of polypeptides that specifically bind to a base pair 
mismatch, an insertion/deletion loop and/or a nucleotide gap or gaps in a double stranded 
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polynucleotide; . (b) providing a sample comprising a plurality of double-stranded 
oligonucleotides; (c) joining together the double-stranded oligonucleotides of step (b) to 
generate a double-stranded polynucleotide; (d) contacting the double-stranded 
polynucleotide of step (c) with the polypeptides of step (a) under conditions wherein a 

5 polypeptide of step (a) can specifically bind to a base pair mismatch, an insertion/deletion 
loop and/or a nucleotide gap or gaps in a double stranded polynucleotide of step (c); and (e) 
separating the double-stranded polynucleotides lacking a specifically bound polypeptide of 
step (a) from the double-stranded polynucleotides to which a polypeptide of step (a) has 
specifically bound, thereby purifying double-stranded polynucleotides lacking base pair 

10 mismatches, insertion/deletion loops and/or nucleotide gaps. In one aspect, the double- 
stranded oligonucleotides comprise a library of oligonucleotides multicodon building blocks, 
the library comprising a plurality of double-stranded oligonucleotide members, wherein each 
oligonucleotide member comprises at least two codons in tandem and a Type-IIS restriction 
endonuclease recognition sequence flanking the 5' and the 3* end of the multicodon. 

15 In one aspect, the method further comprises providing a set of 61 immobilized 

starter oligonucleotides, one oligonucleotide for each possible amino acid coding triplet, 
wherein the oligonucleotides are immobilized on a substrate and have a single-stranded 
overhang corresponding to a single-stranded overhang generated by a Type-IIS restriction 
endonuclease, or, the oligonucleotides comprise a Type-IIS restriction endonuclease 

20 recognition site distal to the substrate and a single-stranded overhang is generated by 
digestion with a Type-IIS restriction endonuclease; digesting a second oligonucleotide 
member from the library of step (a) with a Type-IIS restriction endonuclease to generate a 
single-stranded overhang; and contacting the digested second oligonucleotide member to the 
immobilized first oligonucleotide member under conditions wherein complementary single- 

25 stranded base overhangs of the first and the second oligonucleotides can pair, and, ligating 
the second oligonucleotide to the first oligonucleotide, thereby generating a double-stranded 
polynucleotide. 

The invention provides a method for generating a base pair mismatch-free, 
insertion/ deletion loop-free and/or gap-free double-stranded polypeptide coding sequence 
30 comprising the following steps: (a) providing a plurality of polypeptides that specifically 
bind to a base pair mismatch, an insertion/deletion loop and/or a nucleotide gap or gaps 
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within a double stranded polynucleotide; (b) providing a sample comprising a plurality of 
double-stranded polynucleotides encoding a fusion protein, wherein the fusion protein coding 
sequence comprises a coding sequence for a polypeptide of interest upstream of and in frame 
with a coding sequence for a marker or a selection polypeptide; (c) contacting the double- 
stranded polynucleotides of step (b) with the polypeptides of step (a) under conditions 
wherein a polypeptide of step (a) can specifically bind to a base pair mismatch, an insertion/ 
deletion loop and/or a nucleotide gap or gaps in a double stranded polynucleotide of step (b); 
(d) separating the double-stranded polynucleotides lacking a specifically bound polypeptide 
of step (a) from the double-stranded polynucleotides to which a polypeptide of step (a) has 
specifically bound, thereby purifying double-stranded polynucleotides lacking base pair 
mismatches, insertion/deletion loops and/or a nucleotide gap or gaps; (e) expressing the 
purified double-stranded polynucleotides and selecting the polynucleotides expressing the 
selection marker polypeptide, thereby generating a base pair mismatch-free, insertion/ 
deletion loop-free and/or gap-free double-stranded polypeptide coding sequence. 

In one aspect, the marker or selection polypeptide comprises a self-splicing 
intein, and the method further comprises the self-splicing out of the intein marker or selection 
polypeptide from the upstream polypeptide of interest. The marker or selection polypeptide 
can comprise an enzyme, such as a enzyme used to identity insert or vector-positive clones, 
such as a LacZ enzyme. The marker or selection polypeptide can also comprise an antibiotic, 
such as a kanamycin, a penicillin or a hygromycin. 

In alternative aspects of the invention, the methods generate a sample or 
"batch" of purified oligonucleotides and/or polynucleotides that are 90%, 95%, 96%, 97%, 
98%, 99%, 99.5% and 100% or completely free of base pair mismatches, insertion/deletion 
loops and/or a nucleotide gap or gaps. 

The nucleic acids manipulated or altered by any means, including random or 
stochastic methods, or, non-stochastic, or "directed evolution," can be "purified" or 
"processed" by the methods of the invention, e.g., the methods of the invention can be used 
to generate a sample or "batch" of double-stranded oligonucleotides and/or polynucleotides 
that are 90%, 95%, 96%, 97%, 98%, 99%, 99.5% and 100% or completely free of base pair 
mismatches, insertion/deletion loops and/or a nucleotide gap or gaps, wherein the nucleic 
acids (e.g., oligos, polynucleotides, genes, and the like) have been manipulated by stochastic 
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methods, or, non-stochastic, or "directed evolution." For example, the methods of the 
invention can be used to "purify" or "process" nucleic acids manipulated by saturation 
mutagenesis, an optimized directed evolution system, synthetic ligation reassembly, or a 
combination thereof, as described herein. The methods of the invention can be used to 
"purify" or "process" nucleic acids manipulated by a method comprising gene site saturated 
mutagenesis (GSSM). The methods of the invention can be used to "purify" or "process" 
nucleic acids manipulated by gene site saturated mutagenesis (GSSM), step-wise nucleic acid 
reassembly, error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly 
PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive 
ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene 
reassembly, synthetic ligation reassembly (SLR) or a combination thereof. The methods of 
the invention can be used to "purify" or "process" nucleic acids manipulated by 
recombination, recursive sequence recombination, phosphothioate-modified DNA 
mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point 
mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, 
radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction- 
purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic 
acid multimer creation or a combination thereof. 

In one aspect, method of the invention comprises purifying a double-stranded 
nucleic acid comprising a synthetic, a naturally isolated, or a recombinantly generated 
nucleic acid (a polynucleotide or an oligonucleotide). The synthetic polynucleotide can be 
identical to a parental or a natural sequence. In one aspect, the polynucleotide comprises a 
gene, a chromosome. In one aspect, the gene further comprises a pathway. In one aspect, the 
gene comprises a regulatory sequence. In one aspect, the polynucleotide comprises a 
promoter or an enhancer or a polypeptide coding sequence. The polypeptide can be an 
enzyme, an antibody, a receptor, a neuropeptide, a chemokine, a hormone, a signal sequence, 
or a structural gene. In one aspect, the polynucleotide comprises non-coding sequence. 

In one aspect, a polynucleotide purified by a method of the invention 
comprises a DNA (e.g., a gene or coding sequence), an RNA (e.g., an iRNA, an rRNA, a 
tRNA or an mRNA) or a combination thereof. For example, the methods of the invention 
can be used to generate a sample or "batch" of double-stranded DNA or RNA that are 90%, 
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95%, 96%, 97.%, 98%, 99%, 99.5% and 100% or completely free of base pair mismatches, 
insertion/deletion loops and/or a nucleotide gap or gaps. In one aspect, the double-stranded 
polynucleotide comprises an iRNA. The double-stranded polynucleotide can comprise a 
DNA, e.g., a gene. In one aspect, the DNA comprises a chromosome. 

COMPOSITIONS AND METHODS FOR MAKING POLYNUCLEOTIDES BY 
ASSEMBLY OF CODON BUILDING BLOCKS 

The invention provides methods and compositions for making nucleic acids 
by iterative assembly of oligonucleotide building blocks. In one aspect, the invention 
provides libraries of oligonucleotides comprising multicodon (e.g., dicodon, tricodon) 
building blocks. In one aspect, the library comprises a plurality of double-stranded 
oligonucleotide members, wherein each oligonucleotide member comprises two or more 
codons in tandem (e.g., a dicodon) and a Type-IIS restriction endonuclease recognition 
sequence flanking the 5' and the 3' end of the multicodon (e.g., dicodon, tricodon, 
tetracodon, and the like). 

In different aspects, this invention provides that the building blocks can be X- 
mers (where can be any integer from 3 to one billion). In other aspects, six-mere can be used 
that are not dicodons prior to assembly with other building blocks (because they are frame- 
shifted), but that can become codons after assembly with other building blocks. In other 
aspects, the intended product is not a coding sequence (but may be, e.g. a promoter, an 
enhancer, or any other regulatory motif), so the building blocks do not need to function as 
codons either before or after assembly with other building blocks. In other aspects, the 
assembly product can be, e.g., operons, gene pathways, chromosomes, or genomes. Thus, 
the term "codon" includes all nucleic acid sequences, including sequences that code for "non- 
coding" sequences such as regulatory motifs (e.g., promoters, enhancers), operons, structural 
sequences (e.g., telomeres) and the like. 

In one aspect, the library comprises oligonucleotide members comprising all 
possible codon combinations, e.g., all possible dimer (dicodon) combinations, tricodon 
combinations, tetracodon combinations, and the like. In one aspect, the library of the 
invention can comprise oligonucleotide members comprising 4096 different possible codon 
dimer (dicodon) combinations (proteins are synthesized according to base triplets (codons) in 
a given DNA sequence; there are 61 different triplets coding for 20 different amino acids). 

10 
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The library can be of any size and can include anywhere from one to 4096 different 
members, e.g., the library can comprise about 50, 100, 150, 200, 250, 300, 350, 400, 450, 
500, 600, 700, 800, 900, 1000, 2000, 3000, 4000 or more different members. In one aspect, 
none of the codons are stop codons. 

In one aspect, the Type-HS restriction endonuclease recognition sequence at 
the 5' end of the dicodon differs from the Type-IIS restriction endonuclease recognition 
sequence at the 3' end of the dicodon. The Type-IIS restriction endonuclease recognition 
sequence can be specific for a restriction endonuclease that, upon digestion of the 
oligonucleotide library member, generates abase overhang, including a one base single- 
stranded overhang, a two base single-stranded overhang, a three base single-stranded 
overhang, a four base single-stranded overhang, and the like. The restriction endonuclease 
can comprise a Sapl restriction endonuclease or an isochizomer thereof, or, an Earl 
restriction endonuclease or an isochizomer thereof. In one aspect, the Type-HS restriction 
endonuclease recognition sequence is specific for a restriction endonuclease that, upon 
digestion of the oligonucleotide library member, generates a two base single-stranded 
overhang. The restriction endonuclease can be a BseRI, a Bsgl or a Bpml restriction 
endonuclease. In one aspect, the Type-IIS restriction endonuclease recognition sequence is 
specific for a restriction endonuclease that, upon digestion of the oligonucleotide library 
member, generates a one base single-stranded overhang. The restriction endonuclease can be 
an N.AlwI or an N.BstNBI restriction endonuclease. 

In one aspect, the Type-HS restriction endonuclease recognition sequence is 
specific for a restriction endonuclease that, upon digestion of the oligonucleotide library 
member, cuts on both sides of the Type-IIS restriction endonuclease recognition sequence. 
The restriction endonuclease can be a Bcgl, a BsaXI or a BspCNI restriction endonuclease. 

In one aspect, each oligonucleotide library member consists essentially of two 
codons in tandem (a dicodon) and a Type-nS restriction endonuclease recognition sequence 
flanking the 5* and the 3' end of the dicodon. 

In alternative aspects, the oligonucleotide library members are between about 
20 and 400 base pairs in length, between about 40 and 200 base pairs in length or between 
about 100 and 150 base pairs in length. 
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. The oligonucleotide library member can comprise a (complementary base 
paired) sequence (NNN)(NNN) AGAAGAGC (SEQ ID NO:l) and (NNN)(NNN) 
TCTTCTCG (SEQ ID NO:2), wherein (NNN) is a codon and N is A, C, T or G or an 
equivalent thereof. 

The oligonucleotide library member can comprise a (complementary base 
paired) sequence (NNN)(NNN) TGAAGAGAG (SEQ ED NO:3) and (NNN)(NNN) 
ACTTCTCTC (SEQ ID NO:4), wherein (NNN) is a codon and N is A, C, T or G or an 
equivalent thereof. 

The oligonucleotide library member can comprise a (complementary base 
paired) sequence (NNN)(NNN) TGAAGAGAG CT GCTACTAACT GCA (SEQ ID NO:5) 
and (NNN) (NNN) ACTTCTCTC GA CGATGATTG (SEQ ID NO:6), wherein (NNN) is a 
codon and N is A, C, T or G or an equivalent thereof. 

The oligonucleotide library member can comprise a (complementary base 
paired) sequence CTCTCTTCA NNN NNN AGAAGAGC (SEQ ID NO:7) and 
GAGAGAAGT NNN NNN TCTTCTCG (SEQ ID NO:8), wherein (NNN) is a codon and N 
is A, C, T or G or an equivalent thereof. 

The oligonucleotide library member can comprise a (complementary base 
paired) sequence CTCTCTTCA NNN NNN AGAAGAGC GGGTCTTCCAACT 
AGAGAATTCGATATCTGCA (SEQ ID NO:9) and GAGAGAAGT NNN NNN 
TCTTCTCG CCCAGAAGGTTGATCTCTTAAGCTATAG (SEQ ID NO:10), wherein 
(NNN) is a codon and N is A, C, T or G or an equivalent thereof. 

The invention provides a method for building a polynucleotide comprising 
codons by iterative assembly of multicodon (e.g., dicodon) building blocks. In one aspect, 
the method comprises the following steps: (a) providing a library of double-stranded codon 
building block oligonucleotides of the invention; (b) providing a substrate surface; (c) 
immobilizing a first oligonucleotide member from the library of step (a) to the substrate 
surface of step (b) and digesting with a Type-IIS restriction endonuclease to generate a 
single-stranded overhang in a codon, or, digesting a first oligonucleotide member from the 
library of step (a) with a Type-IIS restriction endonuclease to generate a single-stranded 
overhang in a codon and immobilizing to the substrate surface of step (b) by the 
oligonucleotide end opposite the codon; (d) digesting a second oligonucleotide member 
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from the library . of step (a) with a Type-US restriction endonuclease to generate a single- 
stranded overhang in a codon; and (e) contacting the digested second oligonucleotide 
member of step (d) to the digested immobilized first oligonucleotide member of step (c) 
under conditions wherein complementary single-stranded base overhangs of the first and the 

5 second oligonucleotides can pair, and, ligating the second oligonucleotide to the first 

oligonucleotide; thereby building a polynucleotide comprising codons by iterative assembly 
of multicodon (e.g., dicodon) building blocks. 

The methods of the invention can further comprise digesting the immobilized 
oligonucleotide of step (e) with a Type-IIS restriction endonuclease to generate a single- 

10 stranded overhang in a codon, wherein the Type-IIS restriction endonuclease recognizes a 
restriction endonuclease recognition sequence in the oligonucleotide distal to the substrate 
surface. The methods of the invention can further comprise digesting another 
oligonucleotide member from the library of step (a) with a Type-IIS restriction endonuclease 
to generate a single-stranded overhang in a codon. The methods of the invention can further 

15 comprise contacting a digested oligonucleotide library member to a digested immobilized 
oligonucleotide member under conditions wherein complementary single-stranded base 
overhangs of the oligonucleotides can pair, and, ligating the oligonucleotides; thereby 
building a polynucleotide comprising codons by iterative assembly of multicodon (e.g., 
dicodon) building blocks. 

20 In one aspect, the method is repeated iteratively, thereby building a 

polynucleotide comprising a plurality of codons. The method can be iteratively repeated n 
times, wherein n is an integer between 2 and 10 6 or more. The method can iteratively 
repeated n times, wherein n is an integer between 10 2 and 10 5 . 

In one aspect, a member of the library is randomly selected for iterative 

25 assembly to the polynucleotide. All or a subset of the members of the library added to the 
polynucleotide can be selected randomly. 

In one aspect, a member of the library is non-stochastically selected for 
iterative assembly to the polynucleotide. All or a subset of the members of the library added 
to the polynucleotide can be selected non-stochastically. 

30 In one aspect, the library of oligonucleotides comprises all possible codon 

combinations, e.g., dimer (dicodon) combinations, tricodon combinations and the like. In 
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one aspect, the library of oligonucleotides consists of 4096 codon dimer (dicodon) 
combinations. In one aspect, the codons are not stop codons. 

In one aspect, the substrate surface comprises a solid surface. The solid 
surface can comprise a bead. The solid surface can comprise a polystyrene or a glass. In one 
aspect, the solid surface comprises a double-orificed container. The double-orificed 
container can comprise a double-orificed capillary array. The double-orificed capillary array 
can be a GIGAMATRIX™ capillary array. 

In one aspect, the substrate surface of step (b) further comprises an 
immobilized double-stranded oligonucleotide. The immobilized double-stranded 
oligonucleotide can further comprise . a codon building block oligonucleotide library member 
of the invention. The codon building block oligonucleotide library member can be 
immobilized to the immobilized double-stranded oligonucleotide by blunt end ligation. 

In one aspect, the immobilized double- stranded oligonucleotide comprises a 
single-stranded base overhang at the non- immobilized end of the oligonucleotide. The 
oligonucleotide library member can be immobilized to the immobilized double-stranded 
oligonucleotide by base pairing of single stranded base overhangs followed by ligation. 

In one aspect, the Type-HS restriction endonuclease recognition sequence at 
the 5' end of the multicodon (e.g., dicodon) differs from the Type-HS restriction 
endonuclease recognition sequence at the 3' end of the multicodon (e.g., dicodon). 

In one aspect, the Type-HS restriction endonuclease upon digestion of the 
oligonucleotide library member generates a three base single-stranded overhang. The Type- 
HS restriction endonuclease comprises a Sapl restriction endonuclease or an isochizomer 
thereof, or, an Earl restriction endonuclease or an isochizomer thereof. 

In one aspect, the Type-IIS restriction endonuclease upon digestion of the 
oligonucleotide library member generates a two base single-stranded overhang. The Type- 
HS restriction endonuclease can be a BseRl, a Bsgl or a Bpml restriction endonuclease or an 

isochizomer thereof 

In one aspect, the Type-IIS restriction endonuclease upon digestion of the 
oligonucleotide library member generates a one base single-stranded overhang. The Type- 
HS restriction endonuclease can be a N.AlwI or a N.BstNBI restriction endonuclease or an 
isochizomer thereof. 
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In one aspect, the Type-IIS restriction endonuclease upon digestion of the 
oligonucleotide library member cuts on both sides of the Type-IIS restriction endonuclease 
recognition sequence. The Type-IIS restriction endonuclease can be a Bcgl, a BsaXI or a 
BspCNI restriction endonuclease or an isochizomer thereof. 

In one aspect, each library member consists essentially of two codons in 
tandem (a dicodon) and a Type-IIS restriction endonuclease recognition sequence flanking 
the 5' and the 3' end of the dicodon. In alternative aspects, each library member can be 
three, four, five, six or more codons in tandem and a Type-IIS restriction endonuclease 
recognition sequence flanking the 5' and the 3 ' end of the multicodon. 

In alternative aspects, the oligonucleotide library members are between about 
20 and 400 or more base pairs in length, between about 40 and 200 base pairs in length, 
between about 100 and 150 base pairs in length. 

In one aspect, an oligonucleotide library member comprises a sequence 
(NNN)(NNN) AGAAGAGC (SEQ ID NO: 1) and (NNN)(NNN) TCTTCTCG (SEQ ID 
NO:2), wherein (NNN) is a codon and N is A, C, T or G or an equivalent thereof. 

In one aspect, an oligonucleotide library member comprises a sequence 
(NNN) (NNN) TGAAGAGAG (SEQ ID NO:3) and (NNN) (NNN) ACTTCTCTC (SEQ ID 
NO:4), wherein (NNN) is a codon and N is A, C, T or G or an equivalent thereof. 

In one aspect, an oligonucleotide library member comprises a sequence 
(NNN)(NNN) TGAAGAGAG CT GCTACTAACT GCA (SEQ ID NO:5) and 
(NNN) (NNN) ACTTCTCTC GA CGATG ATTG (SEQ ID NO:6), wherein (NNN) is a 
codon and N is A, C, T or G or an equivalent [hereof. 

In one aspect, an oligonucleotide library member comprises a sequence 
CTCTCTTCA NNN NNN AGAAGAGC (SEQ ID NO:7) and GAGAGAAGT NNN NNN 
TCTTCTCG (SEQ ID NO:8), wherein (NNN) is a codon and N is A, C, T or G or an 

equivalent thereof. 

In one aspect, an oligonucleotide library member comprises a sequence 
CTCTCTTCA NNN NNN AGAAGAGC G C GTCTTCC AACTAGAGAATTCGAT 
ATCTGCA (SEQ ID NO:9) and GAGAGAAGT NNN NNN TCTTCTCG CCCAGA 
AGGTTGATCTCTTAAGCTATAG (SEQ ID NO:10), wherein (NNN) is a codon and N is 
A, C, T or G or an equivalent thereof. 
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In one aspect, the immobilized double-stranded oligonucleotide comprises a 
general formula: [Substrate] (linker) (promoter) (restriction site)(single stranded overhang). 
In one aspect, the immobilized double-stranded oligonucleotide comprises a general formula: 
(Y)n (promoter) (restriction site)(single stranded overhang), wherein Y is any nucleotide base 
and n is an integer between 2 and 50, or more. Any promoter can be used, e.g., constitutive 
or inducible. In one aspect, the promoter is a T6 promoter, a T3 promoter or an SP6 
promoter. In one aspect, the promoter is directly attached to a substrate, or, is attached by a 
linker, which can be (Y)n nucleotide bases. The attachment to the substrate (the 
immobilization) can be direct or indirect, e.g., by covalent attachment or by hybridization of 
complementary base pairs. 

In one aspect, an immobilized double-stranded oligonucleotide comprises a 
sequence (NNN) (NNN) CGCGCG(Y)nCGAATTGGAGCTC (SEQ ID NO: 11) and 
(NNN) (NNN) GCGCGC(Y)nGCTTAACCTCGAGCCCC (SEQ ID NO: 12), wherein n is 
an integer greater than or equal to 1, Y is any nucleoside and (NNN) is a codon. 

In one aspect, an immobilized double-stranded oligonucleotide comprises a 
sequence (NNN) (NNN) CGCGCGTAATACGACTCACTATAGGGCGAATTG GAGCTC 
(SEQ ID NO:13) and (NNN) (NNN) and GCGCGCATTATGCTGAGTGA 
TATCCCGCTTAACCTCGAGCCCC (SEQ ID NO:14). 

In one aspect, an immobilized double-stranded oligonucleotide comprises a 
promoter. The promoter can comprise a bacteriophage promoter, such as a T7 promoter, a 
T6 promoter or an SP6 promoter. 

In one aspect, ligating the oligonucleotides comprises use of an enzyme, such 
as a ligase. Any ligase can be used, such as a mammalian or a bacteria DNA ligase, 
including, e.g., a T4 ligase or an E. coli ligase. 

In one aspect, the methods of the invention further comprise sequencing the 
constructed polynucleotide. The methods of the invention can further comprise determining 
whether all or part of the polynucleotide sequence encodes a peptide or a polypeptide. The 
methods of the invention can further comprise isolating the constructed polynucleotide. The 
methods of the invention can further comprise polymerase-based amplification of the 
constructed polynucleotide. The polymerase-based amplification can be a polymerase chain 
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reaction (PCR). The methods of the invention can further comprise transcription of the 
constructed polynucleotide. 

In one aspect, the solid substrate comprises a double-orificed container. The 
double-orificed container can comprise a double-orificed capillary array. The double- 
orificed capillary array can be a GIGAMATREX™ capillary array. 

The invention provides a multiplexed system for building a polynucleotide 
comprising codons by iterative assembly of codon building blocks comprising the following 
components: (a) a library comprising oligonucleotide members, wherein each 
oligonucleotide member comprises multiple codons in tandem, e.g., two codons in tandem (a 
dicodon), and a Type-HS restriction endonuclease recognition sequence flanking the 5' and 
the 3' end of the multicodon (e.g., dicodon); and, (b) a substrate surface comprising a 
plurality of oligonucleotide library members of step (a) immobilized to the substrate surface. 

The invention provides multiplexed systems for building polynucleotide 
comprising codons by iterative assembly of oligonucleotides comprising the following 
components: (a) a library of oligonucleotides of the invention; and (b) a substrate surface 
comprising a plurality of oligonucleotides of step (a) immobilized to the substrate surface. In 
one aspect, the substrate surface can further comprise a double-orificed capillary array. The 
double-orificed capillary array can comprise a GIGAMATRIX™ capillary array. The 
multiplexed system can further comprise instructions comprising all or part of a method of 
the invention. The substrate surface can comprise a plurality of beads, such as magnetic 
beads. In one aspect, the plurality of beads comprises 61 sets of beads, each comprising an 
oligonucleotide comprising a dicodon, one bead set for each possible amino acid coding 
triplet. 

The invention provides kits comprising a plurality of beads sets, each bead set 
comprising an immobilized oligonucleotide comprising a multicodon, wherein each 
multicodon is flanked by a Type-HS restriction endonuclease recognition sequence on its 
non-immobilized end. 

The invention provides kits comprising a plurality of beads comprising 61 sets 
of beads, each bead comprising an immobilized oligonucleotide comprising an amino acid 
coding triplet, one bead set for each possible amino acid coding triplet, wherein each possible 
amino acid coding triplet is flanked by a Type-HS restriction endonuclease recognition 
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sequence on its non-immobilized end. In one aspect, an immobilized oligonucleotide 
comprises a promoter. The promoter can comprise a bacteriophage promoter, such as a T7 
promoter, a T6 promoter or an SP6 promoter. In one aspect, the kits further comprise an 
enzyme, such as a ligase, e.g., a mammalian or a bacteria DNA ligase, including, e.g., a T4 
ligase or an E. coli ligase. 

These nucleic acids can be further manipulated or altered by any means, 
including random or stochastic methods, or, non-stochastic, or "directed evolution." For 
example, these nucleic acids can be manipulated by saturation mutagenesis, an optimized 
directed evolution system, synthetic ligation reassembly, or a combination thereof, as 
described herein. These nucleic acids can be manipulated by a method comprising gene site 
saturated mutagenesis (GSSM), step-wise nucleic acid reassembly, error-prone PCR, 
shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in 
vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential 
ensemble mutagenesis, site-specific mutagenesis, gene reassembly, synthetic ligation 
reassembly (SLR) or a combination thereof. These nucleic acids can be manipulated by 
recombination, recursive sequence recombination* phosphothioate-modified DNA 
mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point 
mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, 
radiogenic mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction- 
purification mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic 
acid multimer creation or a combination thereof. 

CHIMERIC ANTIGEN BINDING MOLECULES AND METHODS FOR MAKING AND 
USING THEM 

The invention provides a library of chimeric nucleic acids encoding a plurality 
of chimeric antigen binding polypeptides, the library made by a method comprising the 
following steps: (a) providing a plurality of nucleic acids encoding a lambda light chain 
variable region polypeptide domain (V x ) or a kappa light chain variable region polypeptide 
domain (V0; (b) providing a plurality of oligonucleotides encoding a J region polypeptide 
domain (Vj); (c) providing a plurality of nucleic acids encoding a lambda light chain 
constant region polypeptide domain (C\) or a kappa light chain constant region polypeptide 
domain (CO; (d) joining together a nucleic acid of step (a), a nucleic acid of step (c) and an 
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oligonucleotide of step (b), wherein the oligonucleotide of step (b) is placed between the 
nucleic acids of step (a) and step (c) to generate a V-J-C chimeric nucleic acid coding 
sequence encoding a chimeric antigen binding polypeptide, and repeating this joining step to 
generate a library of chimeric nucleic acid coding sequences encoding a library of chimeric 
antigen binding polypeptides. 

In alternative aspects of the invention, an antigen binding polypeptide 
comprises a single chain antibody, a Fab fragment, an Fd fragment or an antigen binding 
complementarity determining region (CDR). 

The lambda light chain variable region polypeptide domain (VX) nucleic acid 
coding sequence or the kappa light chain variable region polypeptide domain (Vk) nucleic 
acid coding sequence of step (a) can be generated by an amplification reaction. The lambda 
light chain constant region polypeptide domain (CX) nucleic acid coding sequence or the 
kappa light chain constant region polypeptide domain (Ck) nucleic acid coding sequence of 
step (c) also can be generated by an amplification reaction. Any amplification reaction or 
system can be used. The amplification reaction can comprise a polymerase chain reaction 
(PCR) amplification reaction using a pair of oligonucleotide primers. The amplification 
reaction can comprise a ligase chain reaction (LCR), a transcription amplification, a self- 
sustained sequence replication, a Q Beta replicase amplification and other RNA polymerase 
mediated techniques. In one aspect, the oligonucleotide primers can further comprise one or 
more restriction enzyme sites. 

In alternative aspects, the lambda light chain variable region polypeptide 
domain (Vk) nucleic acid coding sequence, the kappa light chain variable region polypeptide 
domain (Vk) nucleic acid coding sequence, the lambda light chain constant region 
polypeptide domain (CX) nucleic acid coding sequence or the kappa light chain constant 
region polypeptide domain (Ck) nucleic acid coding sequence are between about 99 and 
about 600 base pair residues in length, between about 198 and about 402 base pair residues in 
length and between about 300 and about 320 base pair residues in length. 

In one aspect, the amplified nucleic acid is a mammalian nucleic acid, such as 
a human or a mouse nucleic acid. The amplified nucleic acid can be a genomic DNA, a 
cDNA or an RNA. 
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In alternative aspects, an oligonucleotide encoding a J region polypeptide 
domain of step (b) is between about 9 and about 99 base pair residues in length, between 
about 18 and about 81 base pair residues in length and between about 36 and about 63 base 
pair residues in length. 

In alternative aspects, the joining step to generate a chimeric nucleic acid 
comprises a DNA ligase, a transcription or an amplification reaction. The amplification 
reaction can comprise a polymerase chain reaction (PCR) amplification reaction, a ligase 
chain reaction (LCR), a transcription amplification, a self-sustained sequence replication, a Q 
Beta replicase amplification and other RNA polymerase mediated techniques. The 
amplification reaction can comprise use of oligonucleotide primers. The oligonucleotide 
primers can further comprise a restriction enzyme site. The transcription can comprise a 
DNA polymerase transcription reaction. 

The invention provides a library of chimeric nucleic acids encoding a plurality 
of chimeric antigen binding polypeptides, the library made by a method comprising the 
following steps: (a) providing a plurality of nucleic acids encoding an antibody heavy chain 
variable region polypeptide domain (V H ); (b) providing a plurality of oligonucleotides 
encoding a D region polypeptide domain (V D ); (c) providing a plurality of oligonucleotides 
encoding a J region polypeptide domain (Vj); (d) providing a plurality of nucleic acids 
encoding a heavy chain constant region polypeptide domain (C H ); (e) joining together a 
nucleic acid of step (a), a nucleic acid of step (d) and an oligonucleotide of step (b) and step 
(c), wherein the oligonucleotides of step (b) and step (c) are placed between the nucleic acids 
of step (a) and step (d) to generate a V-D-J-C chimeric nucleic acid coding sequence 
encoding a chimeric antigen binding polypeptide, and repeating this joining step to generate a 
library of chimeric nucleic acid coding sequences encoding a library of chimeric antigen 
binding polypeptides. 

In alternative aspects, the antigen binding polypeptide comprises an single 
chain antibody, a Fab fragment, an Fd fragment or an antigen binding complementarity 
determining region (CDR). The antigen binding polypeptide can comprise a ji, y, y2, y3, y4, 
8, e, al or a2 constant region. The heavy chain variable region polypeptide domain (V H ) or 
the heavy chain constant region polypeptide domain (CH) nucleic acid coding sequence can 
be generated by an amplification reaction. The amplification reaction can comprise a 
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polymerase chain reaction (PCR) amplification reaction, a ligase chain reaction (LCR), a 
transcription amplification, a self-sustained sequence replication, a Q Beta replicase 
amplification and other RNA polymerase mediated techniques. The amplification reaction 
can comprise using a pair of oligonucleotide primers. The oligonucleotide primers can 
further comprise a restriction enzyme site. 

In alternative aspects, the heavy chain variable region polypeptide domain 
(Vh) nucleic acid coding sequence or the heavy chain constant region polypeptide domain 
(C H ) nucleic acid coding sequence is between about 99 and about 600 base pair residues in 
length, between about 198 and about 402 base pair residues in length, or between about 300 
and about 320 base pair residues in length. 

The amplified nucleic acid can be a mammalian nucleic acid, such as a human 
or a mouse nucleic acid. The amplified nucleic acid can be a genomic DNA, a cDNA or an 
RNA, e.g., an mRNA. 

In alternative aspects, the oligonucleotide encoding a D region polypeptide 
domain of step (b) or a J region polypeptide domain of step (c) is between about 9 and about 
99 base pair residues in length, between about 18 and about 81 base pair residues in length, 
or between about 36 and about 63 base pair residues in length. 

The joining of step (e) to generate a chimeric nucleic acid can comprise a 
DNA ligase, a transcription or an amplification reaction. The amplification reaction 
comprises a polymerase chain reaction (PCR) amplification reaction, a ligase chain reaction 
(LCR), a transcription amplification, a self-sustained sequence replication, a Q Beta replicase 
amplification and other RNA polymerase mediated techniques. The amplification reaction 
can comprise use of oligonucleotide primers. The oligonucleotide primers can further 
comprise a restriction enzyme site. The transcription can comprise a DNA polymerase 
transcription reaction. 

The invention provides an expression vector comprising a chimeric nucleic 
acid selected from a library of the invention. The invention provides a transformed cell 
comprising a chimeric nucleic acid selected from a library of the invention. The invention 
provides a transformed cell comprising an expression vector of the invention. The invention 
provides a non-human transgenic animal comprising a chimeric nucleic acid selected from a 
library of the invention. 
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The invention provides a method for making a chimeric antigen binding 
polypeptide comprising the following steps: (a) providing a nucleic acid encoding a lambda 
light chain variable region polypeptide domain (V*) or a kappa light chain variable region 
polypeptide domain (V*); (b) providing an oligonucleotides encoding a J region polypeptide 
domain (Vj); (c) providing a nucleic acid encoding a lambda light chain constant region 
polypeptide domain (Cx) or a kappa light chain constant region polypeptide domain (CO; (d) 
joining together a nucleic acid of step (a), a nucleic acid of step (c) and an oligonucleotide of 
step (b), wherein the oligonucleotide of step (b) is placed between the nucleic acids of step 

(a) and step (c) to generate a V-J-C chimeric nucleic acid coding sequence encoding a 
chimeric antigen binding polypeptide. 

The invention provides a method for making a library of chimeric antigen 
binding polypeptides comprising the following steps: (a) providing a plurality of nucleic 
acids encoding a lambda light chain variable region polypeptide domain (V x ) or a kappa light 
chain variable region polypeptide domain (VO; (b) providing a plurality of oligonucleotides 
encoding a J region polypeptide domain (Vj); (c) providing a plurality of nucleic acids 
encoding a lambda light chain constant region polypeptide domain (Cx) or a kappa light chain 
constant region polypeptide domain (Q); (d) joining together a nucleic acid of step (a), a 
nucleic acid of step (c) and an oligonucleotide of step (b), wherein the oligonucleotide of step 

(b) is placed between the nucleic acids of step (a) and step (c) to generate a V-J-C chimeric 
nuclei<? acid coding sequence encoding a chimeric antigen binding polypeptide, and repeating 
this joining step to generate a library of chimeric nucleic acid coding sequences encoding a 
library of chimeric antigen binding polypeptides. 

The invention provides a method for making a chimeric antigen binding 
polypeptide comprising the following steps: (a) providing a nucleic acid encoding an 
antibody heavy chain variable region polypeptide domain (V H ); (b) providing an 
oligonucleotide encoding a D region polypeptide domain (V D ); (c) providing an 
oligonucleotide encoding a J region polypeptide domain (Vj); (d) providing a nucleic acid 
encoding a heavy chain constant region polypeptide domain (Ch); (e) joining together a 
nucleic acid of step (a), a nucleic acid of step (d) and an oligonucleotide of step (b) and step 

(c) , wherein the oligonucleotides of step (b) and step (c) are placed between the nucleic acids 
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of step (a) and step (d) to generate a V-D-J-C chimeric nucleic acid coding sequence 
encoding a chimeric antigen binding polypeptide. 

The invention provides a method for making a library of chimeric antigen 
binding polypeptides comprising the following steps: (a) providing a plurality of nucleic 
acids encoding an antibody heavy chain variable region polypeptide domain (Vh); (b) 
providing a plurality of oligonucleotides encoding a D region polypeptide domain (V D ); (c) 
providing a plurality of oligonucleotides encoding a J region polypeptide domain (Vj); (d) 
providing a plurality of nucleic acids encoding a heavy chain constant region polypeptide 
domain (Ch); (e) joining together a nucleic acid of step (a), a nucleic acid of step (d) and an 
oligonucleotide of step (b) and step (c), wherein the oligonucleotides of step (b) and step (c) 
are placed between the nucleic acids of step (a) and step (d) to generate a V-D-J-C chimeric 
nucleic acid coding sequence encoding a chimeric antigen binding polypeptide, and repeating 
this joining step to generate a library of chimeric nucleic acid coding sequences encoding a 
library of chimeric antigen binding polypeptides. 

The methods the invention can further comprise expressing the nucleic acid 
coding sequences encoding one or a library of chimeric antigen binding polypeptides. The 
methods the invention can further comprise screening the expressed chimeric antigen binding 
polypeptide for its ability to specifically bind an antigen. 

The methods the invention can further comprise mutagenizing the nucleic acid 
coding sequence encoding a chimeric antigen binding polypeptide by a method comprising 
an optimized directed evolution system or a synthetic ligation reassembly, saturation 
mutagenesis, or a combination thereof. The methods the invention can further comprise 
screening the mutagenized chimeric antigen binding polypeptide for its ability to specifically 
bind an antigen. The methods the invention can further comprise screening the mutagenized 
chimeric antigen binding polypeptide for its ability to specifically bind an antigen. The 
methods the invention can further comprise identifying a mutagenized antigen binding site 
variant by its increased antigen binding affinity or antigen binding specificity as compared to 
the affinity or specificity of the chimeric antigen binding polypeptide before mutagenesis. 
The methods the invention can further comprise screening the mutagenized chimeric antigen 
binding polypeptide for its ability to specifically bind an antigen by a method comprising 
phage display of the antigen binding site polypeptide. The methods the invention can further 
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comprise screening the mutagenized chimeric antigen binding polypeptide for its ability to 
specifically bind an antigen by a method comprising expression of the expressed antigen 
binding site polypeptide in a liquid phase. The methods the invention can further comprise 
screening the mutagenized chimeric antigen binding polypeptide for its ability to specifically 
bind an antigen by a method comprising ribosome display of the antigen binding site 
polypeptide. The methods the invention can further comprise screening the chimeric antigen 
binding polypeptide for its ability to specifically bind an antigen by a method comprising 
immobilizing the polypeptide in a solid phase. The methods the invention can further 
comprise screening the chimeric antigen binding polypeptide for its ability to specifically 
bind an antigen by a method comprising a capillary array. The methods the invention can 
further comprise screening the chimeric antigen binding polypeptide for its ability to 
specifically bind an antigen by a method comprising a double-orificed container. The 
double-orificed container can comprise a double-orificed capillary array. The double- 
orificed capillary array can be a GIGAMATRK™ capillary array. 

The method provides a method for making a library of chimeric antigen 
binding polypeptides comprising the following steps: (a) providing a plurality of V- J-C 
chimeric nucleic acids encoding a chimeric antigen binding polypeptide made by a method as 
set forth in claim 48 or a plurality of V-D-J-C chimeric nucleic acids encoding a chimeric 
antigen binding polypeptide made by a method as set forth in claim 50; (b) providing a 
plurality of oligonucleotides, wherein each oligonucleotide comprises a sequence 
homologous to a chimeric nucleic acid of step (a), thereby targeting a specific sequence of 
the chimeric nucleic acid, and a sequence that is a variant of the chimeric nucleic acid; and 
(c) generating "n" number of progeny polynucleotides comprising non-stochastic sequence 
variations by replicating the chimeric nucleic acid of step (a) with the oligonucleotides of 
step (b), wherein n is an integer, thereby generating a library of chimeric antigen binding 
polypeptides. 

In alternative aspects, the sequence homologous to the chimeric nucleic acid is 
x bases long, wherein x is an integer between 3 and 100, between 5 and 50 and between 10 
and 30. In one aspect, the sequence that is a variant of the chimeric nucleic acid is x bases 
long, wherein x can be an integer between 1 and 50 or between 2 and 20. The 
oligonucleotide of step (b) can further comprise a second sequence homologous to the 
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chimeric nucleic acid, wherein the variant sequence is flanked by the sequences homologous 
to the chimeric nucleic acid. In one aspect, the second sequence that is a variant of the 
chimeric nucleic acid is x bases long, wherein x is an integer between 1 and 50, or, where x is 
3, 6, 9 or 12. 

5 In one aspect, the oligonucleotides can comprise variant sequences targeting a 

chimeric nucleic acid codon, thereby generating a plurality of progeny chimeric 
polynucleotides comprising a plurality of variant codons. The variant sequences can 
generate variant codons encoding all nineteen naturally-occurring amino acid variants for a 
targeted codon, thereby generating all nineteen possible natural amino acid variations at the 

10 residue encoded by the targeted codon. The oligonucleotides can comprise variant sequences 
targeting a plurality of chimeric nucleic acid codons. The oligonucleotides can comprise 
variant sequences targeting all of the codons in the chimeric nucleic acid, thereby generating 
a plurality of progeny polypeptides wherein all amino acids are non-stochastic variants of the 
polypeptide encoded by the chimeric nucleic acid. The variant sequences can generate 

15 variant codons encoding all nineteen naturally-occurring amino acid variants for all of the 

chimeric nucleic acid codons, thereby generating a plurality of progeny polypeptides wherein 
all amino acids are non-stochastic variants of the polypeptide encoded by the chimeric 
nucleic acid and a variant for all nineteen possible natural amino acids at all of the codons. 

In alternative aspects of the methods, in generating "n" number of progeny 

20 polynucleotides comprising non-stochastic sequence variations, "n" is an integer between 1 
and about 10 30 , between about 10 2 and about 10 20 , or between about 10 2 and about 10 10 . 

In alternative aspects of the methods, the replicating of step (c) comprises an 
enzyme-based replication, such as a polymerase-based amplification reaction. The 
amplification reaction can comprise a polymerase chain reaction (PCR). The enzyme-based 

25 replication can comprise an error-free polymerase reaction. 

In one aspect of the methods, an oligonucleotide of step (b) further comprises 
a nucleic acid sequence capable of introducing one or more nucleotide residues into the 
template polynucleotide. The oligonucleotide of step (b) can further comprise a nucleic acid 
sequence capable of deleting one or more residue from the template polynucleotide. The 

30 oligonucleotide of step (b) can further comprise addition of one or more stop codons to the 
template polynucleotide. 
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The invention provides a method for making a library of chimeric antigen 
binding polypeptides comprising the following steps: (a) providing x number of V-J-C 
chimeric nucleic acids encoding a chimeric antigen binding polypeptide made by a method as 
set forth in claim 48 or x number of V-D-J-C chimeric nucleic acids encoding a chimeric 
5 antigen binding polypeptide made by a method as set forth in claim 50; (b) providing y 
number of building block polynucleotides, wherein y is an integer, and the building block 
polynucleotides are designed to cross-over reassemble with a chimeric nucleic acid of step 
(a) at predetermined sequences and comprise a sequence that is a variant of the chimeric 
nucleic acid and a sequence homologous to the chimeric nucleic acid flanking the variant 
10 sequence; and, (c) combining at least one building block polynucleotide with at least one 
chimeric nucleic acid such that the building block polynucleotide cross-over reassembles 
with the chimeric nucleic acid to generate non-stochastic progeny chimeric polynucleotides, 
thereby generating a library of polynucleotides encoding chimeric antigen binding 
polypeptides. 

15 In alternative aspects of the method, x is an integer between 1 and about 10 10 , 

or between about 10 and about 10 2 , or, x is an integer selected from the group consisting of 1, 
2, 3,4, 5, 6, 7, 8, 9 and 10. 

In one aspect, a plurality of building block polynucleotides are used and the 
variant sequences target a chimeric nucleic acid codon to generate a plurality of progeny 

20 polynucleotides that are variants of the targeted codon, thereby generating a plurality of 

natural amino acid variations at a residue in a polypeptide encoded by the chimeric nucleic 
acid. In one aspect, the variant sequences generate variant codons encoding all nineteen 
naturally-occurring amino acid variants for the targeted codon, thereby generating all 
nineteen possible natural amino acid variations at the residue encoded by the targeted codon 

25 in a polypeptide encoded by the chimeric nucleic acid. 

In one aspect, a plurality of building block polynucleotides are used, and the 
variant sequences target a plurality of chimeric nucleic acid codons, thereby generating a 
plurality of codons that are variants of the targeted codons and a plurality of natural amino 
acid variations at a plurality of residues encoded by the targeted codon in a polypeptide 

30 encoded by the chimeric nucleic acid. In one aspect, the variant sequences generate variant 
codons in all of the codons in the chimeric nucleic acid, thereby generating a plurality of 
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progeny polypeptides wherein all amino acids are non-stochastic variants of the polypeptide 
encoded by the chimeric nucleic acid. In one aspect, the variant sequences generate variant 
codons encoding all nineteen naturally-occurring amino acid variants for all of the chimeric 
nucleic acid codons, thereby generating a plurality of progeny polypeptides wherein all 
5 amino acids are non-stochastic variants of the polypeptide encoded by the chimeric nucleic 
acid and a variant for all nineteen possible natural amino acids at all of the codons. In one 
aspect, all of the codons in an antigen binding site are targeted. 

In alternative aspects, the library comprises between 1 and about 10 30 
members, between about 10 2 and about 10 20 members or between about 10 3 and about 10 10 
10 members. In alternative aspects, an end of a building block polynucleotide comprises at least 
about 6 nucleotides homologous to a chimeric nucleic acid, at least about 15 nucleotides 
homologous to a chimeric nucleic acid or at least about 21 nucleotides homologous to a 
chimeric nucleic acid. 

In one aspect, combining one or more building block polynucleotides with a 
1 5 chimeric nucleic acid comprises z cross-over events between the building block 

polynucleotides and the chimeric nucleic acid, wherein y is an integer between 1 and about 
10 20 , between about 10 and about 10 10 , or between about 10 2 and about 10 5 . 

In alternative aspects, a non-stochastic progeny chimeric polynucleotide 
differs from a chimeric nucleic acid in z number of residues, wherein z is between 1 and 
20 about JO 4 or between 10 and about 10 3 ., or, z is 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. 

In alternative aspects, a non-stochastic progeny chimeric polynucleotide 
differs from a chimeric nucleic acid in z number of codons, wherein z is between 1 and about 
10 4 , z is between 10 and about 10 3 , or z is 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. 

In alternative aspects, the methods of the invention further comprise non- 
25 stochastic modification of all or a part of the sequence of a chimeric antibody coding 
sequence of the invention. The modification can be by any method, including, e.g., by 
"saturation mutagenesis" or "GSSM, " "optimized directed evolution system" and "synthetic 
ligation reassembly" or "SLR" or any combination of these methods. 

Nucleic acids encoding the chimeric antibodies of the invention can be further 
30 manipulated or altered by any means, including random or stochastic methods, or, non- 
stochastic, or "directed evolution." For example, nucleic acids encoding the chimeric 
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antibodies of .the invention can be manipulated by step-wise nucleic acid reassembly (see 
Example 3, below), saturation mutagenesis, an optimized directed evolution system, 
synthetic ligation reassembly, or a combination thereof; as described herein. Nucleic acids 
encoding the chimeric antibodies of the invention can be manipulated by a method 
comprising gene site saturated mutagenesis (GSSM), error-prone PCR, shuffling, 
oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis, in vivo 
mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential ensemble 
mutagenesis, site-specific mutagenesis, gene reassembly, synthetic ligation reassembly 
(SLR) or a combination thereof. These nucleic acids can be manipulated by recombination, 
recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil- 
containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair 
mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic 
mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification 
mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acidmultimer 
creation or a combination thereof. 

The details of one or more embodiments of the invention are set forth in the 
accompanying drawings and the description below. Other features, objects, and advantages 
of the invention will be apparent from the description and drawings, and from the claims. 

All publications, GenBank Accession references (sequences), ATCC Deposits, 
patents and patent applications cited herein are hereby expressly incorporated by reference 
for all purposes. 

DESCRIPTION OF DRAWINGS 

Figure 1 schematically illustrates an exemplary "elongation cycle" of a gene 
building method of the invention, the method comprising: "loading" starter oligo onto 
substrate; ligation (with any ligase, e.g., T4 ligase pr E. coli ligase); wash; fill-in ends; wash; 
cut with restriction endonuclease; wash; repeat (reiterate cycle), as discussed in detail in the 
Example 1, below. 

Figure 2 schematically illustrates a cloning vector designed to reassemble 
antibody light chains according the methods of the invention, as discussed in Example 2. 
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.. Figure 3 schematically illustrates an exemplary scheme to reassemble lambda 
light chains according the methods of the invention, as discussed in Example 2. 

Figure 4 schematically illustrates an exemplary scheme to reassemble kappa 
light chains accordingBie methods of the invention, as discussed in Example 2. 

Figure 5 schematically illustrates an exemplary scheme to reassemble 
antibody heavy chains according the methods of the invention, as discussed in Example 2. 

Figure 6 illustrates an exemplary procedure for the reassembly of three 
esterase genes, as discussed in Example 3. 

Figure 7 A illustrates the elution of reassembled DNA from the solid support 
using alternative restriction sites engineered in the biotinylated hook, as discussed in 
Example 3. Figure 7B illustrates the elution of final reassembled products from the solid 
support, as discussed in Example 3. 

Figure 8 illustrates an exemplary software program used in the methods of the 

invention. 

Like reference symbols in the various drawings indicate like elements. 

DETAILED DESCRIPTION 

METHODS FOR PURIFYING AND IDENTIFYING DOUBLE-STRANDED NUCLEIC 
ACIDS LACKING BASE PAIR MISMATCHES, INSERTION/DELETION LOOPS OR 
NUCLEOTIDE GAPS 

The invention provides methods for identifying and purifying double-stranded 
polynucleotides lacking nucleotide gaps, base pair mismatches and insertion/deletion loops. 

Definitions 

Unless defined otherwise, all technical and scientific terms used herein have 
the meaning commonly understood by a person skilled in the art to which this invention 
belongs. As used herein, the following terms have the meanings ascribed to them unless 

specified otherwise. 

The phrase "polypeptides that specifically bind to a nucleotide gap or gaps, a 
base pair mismatch and/or an insertion/deletion loop in a double stranded polynucleotide" 
include all polypeptides, natural or synthetic, that can specifically bind to a nucleoside base 
pair mismatch, an insertion/deletion loop and/or a nucleotide gap or gaps in a double 
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The term "DNA glycosylase" includes all natural or synthetic DNA 

glycosylase enzymes that initiate base-excision repair of G:U/T mismatches. The natural 

DNA glycosylase enzymes include, e.g., bacterial mismatch-specific uracil-DNA glycosylase 

(MUG) DNA repair enzymes and eukaryotic thymine-DNA glycosylase (TDG) enzymes, as 

described in further detail, below. 

The term "intein" includes all polypeptide sequences that are self-splicing. 

Inteins are intron-like elements that are removed post-translationally by self-splicing, as 

described in further detail, below. 

The term "saturation mutagenesis" or "GSSM" includes a method that uses 

degenerate oligonucleotide primers to introduce point mutations into a polynucleotide, as 

described in detail herein. 

The term "optimized directed evolution system" or "optimized directed 

evolution" includes a method for reassembling fragments of related nucleic acid sequences, 

e.g., related genes, and explained in detail herein. 

The term "synthetic ligation reassembly" or "SLR" includes a method of 

ligating oligonucleotide fragments in a non-stochastic fashion, and explained in detail herein. 

The terms "nucleic acid" and "polynucleotide" as used herein refer to a 
deoxyribonucleotide or ribonucleotide in either single- or double-stranded form. The terms 
encompass all nucleic acids, e.g., oligonucleotides, and modifications analogues of natural 
nucleotides, e.g., nucleic acids with modified internucleoside linkages. The terms also 
encompass nucleic-acid-like structures with synthetic backbones. Synthetic backbone 
analogues include, e.g., phosphodiester, phosphorothioate, phosphorodithioate, 
methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate, 3'-thioacetal, 
methylene(methylimino), 3'-N-carbamate, morpholino carbamate, and peptide nucleic acids 
(PNAs); see Oligonucleotides and Analogues, a Practical Approach, edited by F. Eckstein, 
IRL Press at Oxford University Press (1991); Antisense Strategies, Annals of the New York 
Academy of Sciences, Volume 600, Eds. Baserga and Denhardt (NYAS 1992); Milligan 
(1993) J. Med Chem. 36:1923-1937; Antisense Research and Applications (1993, CRC 
Press). PNAs contain non-ionic backbones, such as N-(2-aminoethyl) glycine units, and can 
be used as probes (see, e.g., U.S. Patent No. 5,871 ,902). Phosphorothioate linkages are 
described, e.g., in WO 97/03211; WO 96/39154; Mata (1997) Toxicol. Appl. Pharmacol. 
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. MutS DNA repair enzymes include all MutS DNA repair enzymes, including 
synthetic (e.g., genetically reengineered) variations, and eukaryotic (e.g., mammalian) 
homologues of bacterial enzymes, that can bind a nucleoside base pair mismatch or an 
insertion/deletion loop, including, e.g., the Thermus aquaticus (Taq) and Pseudomonas 
aeruginosa MutS DNA repair enzymes. The MutS DNA repair enzyme can be used in the 
form of a dimer. For example, it can be a homodimer of a MutS homolog, e.g., a human 
MutS homolog, a murine MutS homolog, a rat MutS homolog, zDrosophila MutS homolog, 
a yeast MutS homolog, such as a Saccharomyces cerevisiae MutS homolog. See, e.g., U.S. 
Patent No. 6,333,153; Pezza (2002) Biochem J. 361(Pt l):87-95; Biswas (2001) J. MoL Biol. 
305:805-816; Biswas (2000) Biochem J. 347 Pt 3:881-886; Biswas (1999) J. Biol. Chem. 
274:23673-23678. MutS has been shown to preferentially bind a nucleic acid heteroduplex 
containing a deletion of a single base, see, e.g., Biwas (1997) J. Biol. Chem. 272:13355- 
13364; see also, Su (1986) Proc. Natl. Acad. Sci. 83:5057-5061; Malkov (1997) J. Biol. 
Chem. 272:23811-23817. 

Fpg DNA repair enzymes includes all Fpg DNA repair enzymes, including 
synthetic (e.g., genetically reengineered) variations, and eukaryotic (e.g., mammalian) 
homologues of bacterial enzymes, that can bind a nucleoside base pair mismatch or an 
insertion/deletion loop, including, e.g., the Fgp enzyme from Escherichia colL See, e.g., 
Leipold (2000) Biochemistry 39:14984-14992. 

MutY DNA repair enzymes include all MutY DNA repair enzymes, including 
synthetic (e.g., genetically reengineered) variations, and eukaryotic (e.g., mammalian) 
homologues of bacterial enzymes, that can bind a nucleoside base pair mismatch or an 
insertion/deletion loop (see, e.g., Porello (1998) Biochemistry 37:14756-14764; Williams 
(1999) Biochemistry 38:15417-15424). 

DNA glycosylase includes all natural or synthetic DNA glycosylase enzymes 
that initiate base-excision repair of G:U/T mismatches. The natural DNA glycosylase 
enzymes form a homologous family of DNA glycosylase enzymes that initiate base-excision 
repair of G:U/T mismatches, including, e.g., bacterial mismatch-specific uracil-DNA 
glycosylase (MUG) DNArepair enzymes (see, e.g., Barrett (1999) EMBO J. 18:6599-6609) 
and eukaryotic thymine-DNA glycosylase (TDG) enzymes (see, e.g., Barrett (1999) ibid; 
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Barrett (1998) Cell 92:117-129). See also Pearl (2000) Mutat. Res. 460:165-181; 
Niederreither (1998) Oncogene 17:1577-15785. 

Additional nucleotide gap binding polypeptides include, e.g., DNA 
polymerase deltas, such as the DNA polymerase delta isolated in the teleost fish Misgurnus 
5 fossilis (see, e.g., Sharova (2001) Biochemistry (Mosc) 66:402-409); DNA polymerase betas, 
see, e.g., Bhattacharyya (2001) Biochemistry 40:9005-9013; DNA topoisomerases, such as 
type IB DNA topoisomerase V, as in the hyperthermophile Methanopyrus kandleri described 
by Belova (2001) Proc. Natl. Acad. Sci. USA 98:6015-6020; ribosomal proteins, e.g., S3 
ribosomal proteins such as the Drosophila S3 ribosomal protein described by Hegde (2001) 

10 J. Biol. Chem. 276:27591-2756. 

The methods of the invention comprise contacting the double-stranded 
polynucleotides with the polypeptides to be purified of base pair mismatches, 
insertion/deletion loops and/or a nucleotide gap or gaps under conditions wherein a 
mismatch-, an insertion/deletion loop- and/or a gap- binding polypeptide can specifically 

15 bind to a base pair mismatch or an insertion/deletion loop or a nucleotide gap or gaps. These 
conditions are well known in the art, as described, e.g., in the references cited herein, or, can 
be determined or optimized by one skilled in the art without undue experimentation. For 
example, U.S. Patent No. 6,333,153, describes a method comprising contacting a MutS dimer 
and the mismatched duplex DNA in the presence of a binding solution comprising ADP and 

20 optionally ATP. The concentration of ATP, if present, in the binding solution is less than 

about 3 micromolar. The MutS dimer binds ADP, and the MutS ADP-bound dimer associates 
with a mismatched region of the duplex DNA. 

In mammalian cells most altered bases in DNA are repaired through a single- 
nucleotide patch base excision repair mechanism. Base excision repair is initiated by a DNA 

25 glycosylase that removes a damaged base and generates an abasic site (AP site). This AP site 
is further processed by an AP endonuclease activity that incises the phosphodiester bond 
adjacent to the AP site and generates a strand break containing 3-OH and 5-sugar phosphate 
ends. In mammalian cells, the 5-sugar phosphate is removed by the AP lyase activity of 
DNA polymerase beta. The same enzyme also fills the gap, and the DNA ends are finally 

30 rejoined by DNA ligase. Thus, in addition to DNA polymerases such as DNA polymerase 
beta, the methods of the invention also can use DNA glycosylases as oligonucleotide or 



34 



WO 03/060084 



PCT/US03/01189 



polynucleotide binding polypeptides alone or in conjunction with other base pair mismatch-, 
insertion/deletion loop- or nucleotide gap- binding polypeptides. See, e.g., Podlutsky (2001) 
Biochemistry 40:809-813. 

Marker and selection polypeptides 

The invention provides a methods comprising purifying a double-stranded 
polynucleotide lacking base pair mismatches, insertion/deletion loops and/or a nucleotide gap 
or gaps, wherein the polynucleotide encodes a fusion protein coding sequence that comprises 
a coding sequence for a polypeptide of interest upstream of and in frame with a coding 
sequence for a marker or a selection polypeptide. The use of a marker or a selection 
polypeptide coding sequence downstream of and in frame with a polypeptide of interest acts 
to confirm that the polypeptide of interest coding sequence lacks defects that would prevent 
transcription or translation of the fusion protein sequence. Because the marker or a selection 
polypeptide coding sequence is downstream and in frame with the polypeptide of interest 
coding sequence, any such defects would prevent transcription and/or translation of the 
marker or selection polypeptide. For example, this scheme can be used to segregate or purify 
out polypeptide of interest coding sequences lacking base pair mismatches, insertion/deletion 
loops and/or a nucleotide gap or gaps from those with a defect that would prevent 
transcription or translation of the sequence, the defect including, e.g., base pair mismatches, 
insertion/deletion loops and/or gap(s). 

Selection markers can be incorporated to confer a phenotype to facilitate 
selection of cells transformed with the sequences purified by the methods of the invention. 
For example, a marker selection polypeptide can comprise an enzyme, e.g., LacZ encoding a 
polypeptide with beta-galactosidase activity which, when expressed in a transformed cell and 
exposed to the appropriate substrate will produce a detectable marker, e.g., a color. See, e.g., 
Jain (1993) Gene 133:99-102; St Pierre (1996) Gene 169:65-68; Pessi (2001) Microbiology 
147(Pt 8):1993-1995. See also U.S. Patent Nos. 5,444,161; 4,861,718; 4,708,929; 4,668,622. 
Selection markers can code for episomal maintenance and replication such that integration 
into the host genome is not required. Selection markers can code for chloramphenicol acetyl 
transferase (CAT); an enzyme-substrate reaction is monitored by addition of an exogenous 
electron carrier and a tetrazolium salt. See, e.g., U.S. Patent No. 6,225,074. 
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The marker can also encode antibiotic, herbicide or drug resistance to permit 
selection of those cells transformed with the desired DNA sequences. For example, 
antibiotic resistance can be conferred by herpes simplex thymidine kinase (conferring 
resistance to ganciclovir), chloramphenicol resistance enzymes (see, e.g., Harrod (1997) 
Nucleic Acids Res. 25:1720-1726), kanamycin resistance enzymes, aminoglycoside 
phosphotransferase (conferring resistance to G418), bleomycin resistance enzymes, 
hygromycin resistance enzymes, and the like. The marker can also encode a herbicide 
resistance, e.g., chlorosulfuron or Basta. Because selectable marker genes conferring 
resistance to substrates like neomycin or hygromycin can only be utilized in tissue culture, 
chemoresistance genes are also used as selectable markers in vitro and in vivo. The marker 
can also encode enzymes conferring resistance to a drug, e.g., an oubain-resistant (Na, K)- 
ATPase; a MDR1 multidrug transporter (confers resistance to certain cytotoxic drugs), and 
the like. Various target cells are rendered resistant to anticancer drugs by transfer of 
chemoresistance genes encoding P-glycoprotein, the multidrug resistance-associated protein- 
transporter, dihydrofolate reductase, glutathione-S-transferase, 06-alkylguanine DNA 
alkyltransferase, or aldehyde reductase. See, e.g., Licht (1995) Cytokines Mol. Ther. 1:11- 
20; Blondelet-Rouault (1997) Gene 190:315-317; Aubrecht (1997) J. Pharmacol. Exp. Ther. 
281:992-997; Licht (1997) Stem Cells 15:104-111; Yang (1998) Clin. Cancer Res. 4:731-741. 
See also U.S. Patent No. 5,851,804, describing chimeric kanamycin resistance genes; U.S. 
Patent No. 4,784,949. 

The marker or selection polypeptide can also comprise a sequence coding for 
a polypeptide with affinity to a known antibody to facilitate affinity purification, detection, or 
the like. Such detection- and purification-facilitating domains include, but are not limited to, 
metal chelating peptides such as polyhistidine tracts and histidine-tryptophan modules that 
allow purification on immobilized metals, protein A or biotin domains that allow purification, 
e.g., on immobilized immunoglobulin or streptavidin, and the domain utilized in the FLAGS 
extension/affinity purification system (Immunex Corp, Seattle WA). The inclusion of a 
cleavable linker sequences such as Factor Xa or enterokinase (Invitrogen, San Diego CA) 
between the protein of interest and the second domain can also be used, e.g., to facilitate 
purification and for ease of handling and using the protein of interest. For example, a fusion 
protein can comprise six histidine residues followed by thioredoxin and an enterokinase 
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cleavage site j£for example, see Williams (1995) Biochemistry 34:1787-1797). The histidine 
residues facilitate detection and purification while the enterokinase cleavage site provides a 
means for purifying the desired protein of interest from the remainder of the fusion protein. 
Technology pertaining to vectors encoding fusion proteins and application of fusion proteins 
are well described in the patent and scientific literature, see e.g., Kroll (1993) DNA Cell. 
Biol, 12:441-53. 

Inteins 

In one aspect, the marker or selection polypeptide coding sequence can be a 
self-splicing intein. Inteins are intron-like elements that are removed post-translationally by 
self-splicing. Thus, the methods of the invention can further comprise the self-splicing out of 
the marker or selection polypeptide intein coding sequence from the polypeptide of interest. 
Intein sequences are well known in the art. See, e.g., Colston (1994) Mol. Microbiol. 
12:359-363; Perler (1994) Nucleic Acids Res. 22:1 125-1127; Perler (1997) Curr. Opin. 
Chem. Biol. 1:292-299; Giriat (2001) Genet. Eng. (NY) 23:171-199. See also, U.S. Patent 
Nos. 5,795,731; 5,496,714. For example, because inteins are protein splicing elements that 
occur naturally as in-frame protein fusions, intein sequences can be designed or based on 
naturally occurring intein sequences. Inteins are phylogenetically widespread, having been 
found in all three biological kingdoms, eubacteria, archaea and eukaryotes. Alternatively, 
they entirely synthetic splicing sequences. Intein nomenclature parallels that for RNA 
splicing, whereby the coding sequences of a gene (exteins) are interrupted by sequences that 
specify the protein splicing element (intein). 

Purifying error free polynucleotides 

In one aspect, the methods of the invention comprise purifying double- 
stranded polynucleotides lacking a base pair mismatch-, an insertion/deletion loop and/or a 
nucleotide gap or gaps. Any purification methodology can be used, including use of 
antibodies, binding molecules, size exclusion and the like. 

Antibodies and immunoqffinity columns 

In one aspect, antibodies are used to purify a double-stranded polynucleotide 
lacking a base pair mismatch-, an insertion/deletion loop or a nucleotide gap or gaps. For 
example, antibodies can be designed to specifically bind directly to a base pair mismatch-, 
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insertion/deletion loop- or nucleotide gap- binding polypeptide, or, antibodies can bind to an 
epitope bound to the base pair mismatch-, insertion/deletion loop- or nucleotide gap- binding 
polypeptide. The antibody can be bound to a bead, such as a magnetized bead. See, e.g., 
U.S. Patent Nos. 5,981,297; 5,508,164; 5,445,971; 5,445,970. See also, U.S. Patent No. 
5 5,858,223; 5,746,321, and, USPN 6,3 12,910, describing a multistage electromagnetic 
separator to separate magnetically susceptible materials suspended in fluids. 

The separating can comprise use of an immunoaffinity column, wherein the 
column comprises immobilized antibodies capable of specifically binding to the specifically 
bound base pair mismatch-, insertion/deletion loop- or nucleotide gap- binding polypeptide 
10 or an epitope bound to the base pair mismatch-, insertion/deletion loop- or nucleotide gap- 
binding polypeptide. The sample is passed through an immunoaffinity column under 
conditions wherein the immobilized antibodies are capable of specifically binding to the 
specifically bound polypeptide or the epitope, or "tag," bound to the specifically bound 
polypeptide. 

15 Monoclonal or polyclonal antibodies to base pair mismatch-, 

insertion/deletion loop-binding and/or a nucleotide gap- binding polypeptides can be used. 
Methods of producing polyclonal and monoclonal antibodies are known to those of skill in 
the art and described in the scientific and patent literature, see, e.g., Coligan, Current 
Protocols in Immunology, Wiley/Greene, NY (1991); Stites (eds.) Basic and Clinical 

20 Immunology (7th ed.) Lange Medical Publications, Los Altos, CA ("Stites"); Goding, 

Monoclonal Antibodies: Principles and Practice (2d ed.) Academic Press, New York, 
NY (1986); Kohler (1975) Nature 256:495; Harlow (1988) Antibodies, a Laboratory 
Manual, Cold Spring Harbor Publications, New York. Antibodies also can be generated in 
vitro, e.g., using recombinant antibody binding site expressing phage display libraries, in 

25 addition to the traditional in vivo methods using animals. See, e.g., Huse (1989) Science 

246:1275; Ward (1989) Nature 341 :544; Hoogenboom (1997) Trends Biotechnol. 15:62-70; 
Katz (1997) Annu. Rev. Biophys. Biomol. Struct. 26:27-45. 

The term "antibody" includes a peptide or polypeptide derived from, modeled 
after or substantially encoded by an immunoglobulin gene or immunoglobulin genes, or 

30 fragments thereof, capable of specifically binding an antigen or epitope, see, e.g. 

Fundamental Immunology, Third Edition, W.E. Paul, ed., Raven Press, N.Y. (1993); Wilson 
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(1994) J. Immunol. Methods 175:267-273; Yarmush (1992) J. Biochem. Biophys. Methods 
25:85-97. The term antibody includes antigen-binding portions, i.e., "antigen binding sites," 
(e.g., fragments, subsequences, complementarity determining regions (CDRs)) that retain 
capacity to bind antigen, including (i) a Fab fragment, a monovalent fragment consisting of 
the VL, VH, CL and CHI domains; (ii) a F(ab')2 fragment, a bivalent fragment comprising 
two Fab fragments linked by a disulfide bridge at the hinge region; (iii) a Fd fragment 
consisting of the VH and CHI domains; (iv) a Fv fragment consisting of the VL and VH 
domains of a single arm of an antibody, (v) a dAb fragment (Ward et al., (1989) Nature 
341 :544-546), which consists of a VH domain; and (vi) an isolated complementarity 
determining region (CDR). Single chain antibodies are also included by reference in the 
term "antibody." 

Biotin/ avidin separation systems 

Any ligand/ receptor model can be used to purify a double-stranded 
polynucleotide lacking a base pair mismatch-, an insertion/deletion loop and/or a nucleotide 
gap or gaps. For example, a biotin can be attached to a base pair mismatch-, an 
insertion/deletion loop- and/or a nucleotide gap binding polypeptide, or, it can be part of a 
fusion protein comprising a base pair mismatch-, an insertion/deletion loop- and/or a 
nucleotide gap- binding polypeptide. The biotin-binding avidin is typically immobilized, 
e.g., onto a bead, a magnetic material, a column, a gel and the like. The bead can be 
magnetized. See, e.g., the U.S. Patents noted above for making and using magnetic particles 
in purification techniques, and, describing various biotin- avidin binding systems and 
methods for making and using them, U.S. Patent Nos. 6,287,792; 6,277,609; 6,214,974; 
6,022,688; 5,484,701; 5,432,067; 5,374,516. 

Generating and Manipulating Nucleic Acids 

The invention provides methods for purifying double-stranded 
polynucleotides lacking base pair mismatches, insertion/deletion loops and/or a nucleotide 
gap or gaps. Nucleic acids purified by the methods of the invention can be amplified, cloned, 
sequence or further manipulated, e.g., their sequences can be further changed by SLR, GSSM 
and the like. The polypeptides used in the methods of the invention can be expressed 
recombinantly, synthesized or isolated from natural sources. These and other nucleic acids 
needed to make and use the invention can be isolated from a cell, recombinantly generated or 
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made synthetically. The sequences can be isolated by, e.g., cloning and expression of cDNA 
libraries, amplification of message or genomic DNAby PGR, and the like. In practicing the 
methods of the invention, genes can be modified by manipulating a template nucleic acid, as 
described herein. The invention can be practiced in conjunction with any method or protocol 
or device known in the art, which are well described in the scientific and patent literature. 

General Techniques 

The nucleic acids used to practice this invention, whether RNA, cDNA, 
genomic DNA, vectors, viruses or hybrids thereof, may be isolated from a variety of sources, 
genetically engineered, amplified, and/or expressed/ generated recombinant^ Recombinant 
polypeptides generated from these nucleic acids can be individually isolated or cloned and 
tested for a desired activity. Any recombinant expression system can be used, including 
bacterial, mammalian, yeast, insect or plant cell expression systems. 

Alternatively, these nucleic acids can be synthesized in vitro by well-known 
chemical synthesis techniques, as described in, e.g., Adams (1983) J. Am. Chem. Soc. 
105:661; Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. 
Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth. 
Enzymol. 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett. 
22:1859; U.S. Patent No. 4,458,066. 

Techniques for the manipulation of nucleic acids, such as, e.g., subcloning, 
ligations, labeling probes (e.g., random-primer labeling using Klenow polymerase, nick 
translation, amplification), sequencing, hybridization and the like are well described in the 
scientific and patent literature, see, e.g., Sambrook, ed., Molecular Cloning: a 
Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989); 
Current Protocols in Molecular Biology, Ausubel, ed. John Wiley & Sons, Inc., New 
York (1997); Laboratory Techniques in Biochemistry and Molecular Biology: 
Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, 
Tijssen, ed. Elsevier, N.Y. (1993). 

Nucleic acids, vectors, capsids, polypeptides, and the like can be analyzed and 
quantified by any of a number of general means well known to those of skill in the art. These 
include, e.g., analytical biochemical methods such as NMR, spectrophotometry, radiography, 
electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), 
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thin layer chromatography (TLC), and hyperdiffusion chromatography, various 
immunological methods, e.g. fluid or gel precipitin reactions, immunodiffusion, immuno- 
electrophoresis, radioimmunoassays (RIAs), enzyme-linked immunosorbent assays 
(ELISAs), immuno-fluorescent assays, Southern analysis, Northern analysis, dot-blot 
analysis, gel electrophoresis (e.g., SDS-PAGE), nucleic acid or target or signal amplification 
methods, radiolabeling, scintillation counting, and affinity chromatography. 

Amplification of Nucleic Acids 

In practicing the methods of the invention, nucleic acids can be generated and 
reproduced by, e.g., amplification reactions. Amplification reactions can also be used to join 
together nucleic acids to generate fusion protein coding sequences. Amplification reactions 
can also be used to clone sequences into vectors. Amplification reactions can also be used to 
quantify the amount of nucleic acid in a sample, label the nucleic acid (e.g., to apply it to an 
array or a blot), detect the nucleic acid, or quantify the amount of a specific nucleic acid in a 
sample. Message isolated from a cell or a cDNA library are amplified. The skilled artisan 
can select and design suitable oligonucleotide amplification primers. Amplification methods 
are also well known in the art, and include, e.g., polymerase chain reaction, PCR (see, e.g., 
PCR PROTOCOLS, A GUIDE TO METHODS AND APPLICATIONS, ed. Innis, Academic 
Press, N.Y. (1990) and PCR STRATEGIES (1995), ed. Innis, Academic Press, Inc., N.Y., 
ligase chain reaction (LCR) (see, e.g., Wu (1989) Genomics 4:560; Landegren (1988) 
Science 241:1077; Barringer (1990) Gene 89:117); transcription amplification (see, e.g., 
Kwoh (1989) Proc. Natl. Acad. Sci. USA 86:1173); and, self-sustained sequence replication 
(see, e.g., Guatelli (1990) Proc. Natl. Acad. Sci. USA 87:1874); Q Beta replicase 
amplification (see, e.g., Smith (1997) J. Clin. Microbiol. 35:1477-1491), automated Q-beta 
replicase amplification assay (see, e.g., Burg (1996) Mol. Cell. Probes 10:257-271) and other 
RNA polymerase mediated techniques (e.g., NASBA, Cangene, Mississauga, Ontario); see 
also Berger (1987) Methods Enzymol. 152:307-316; Sambrook; Ausubel; U.S. Patent Nos. 
4,683,195 and 4,683,202; Sooknanan (1995) Biotechnology 13:563-564. 

COMPOSITIONS AND METHODS FOR MAKING POLYNUCLEOTIDES BY 
ITERATIVE ASSEMBLY OF CODON BUILDING BLOCKS 

The invention provides compositions and methods for making polynucleotides 

by iterative assembly of codon building blocks. The invention provides libraries of synthetic 
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or recombinant oligonucleotides comprising multicodons (e.g., dicodons, tricodons, 
tetracodons and the like). The libraries comprise oligonucleotides comprising restriction 
endonuclease restriction sites, e.g., Type-IIS restriction endonuclease restriction sites, 
wherein the restriction endonuclease cuts at a fixed position outside of the recognition 
sequence to generate a single stranded overhang. In one aspect, the multicodon (e.g., 
dicodon) is flanked on both ends by a restriction endonuclease restriction site, e.g., Type-IIS 
restriction endonuclease restriction sites. 

The invention also provides methods for generating any nucleic acid 
sequence, such as synthetic genes, antisense constructs, self-splicing introns or transcripts 
(e.g., ribozymes) and polypeptide coding sequences. The polynucleotide construction 
methods comprise use of libraries of pre-made oligonucleotide building blocks and Type-IIS 
restriction endonucleases. Type-IIS restriction endonucleases, upon digestion of an 
oligonucleotide library member, can generate a three, two or a one base single-stranded 
overhang. Type-IIS restriction endonucleases caij include, e.g., Sapl, Earl, BseRI, Bsgl, 
Bpml, N.AlwI, N.BstNBI, Bcgl, BsaXI or BspCNI or an isochizomer thereof. 

In one aspect, the synthesis starts at a solid support, e.g., a bead, such as a 
magnetic bead, or a capillary, such as a GIGAMATRK™, to which is immobilized a 
"starter" oligonucleotide fragment. In one aspect, a library of "elongation fragments" is used 
to build the nucleic acid sequence codon by codon. Where the "elongation fragments" 
comprise dicodons, the library has a total of all possible hexameric dicodon sequences, or 
4096 "elongation fragment oligonucleotides." Each "elongation fragment" is "embedded in" 
or flanked by Type-IIS restriction endonuclease recognition sites. Class IIS restriction 
endonucleases have specific recognition sequences and cut at a fixed distance outside the 
recognition site. Digestion produces compatible overhangs. Newly added fragments can be 
used in molar excess as compared to the immobilized oligonucleotide, or growing 
polynucleotide. The molar excess saturates free ends and drives the ligation to completion. 
Unbound material is washed away. The remaining 5' overhangs can be filled in with Klenow 
DNA polymerase to block them from further elongation in a later cycle. Joined fragments 
can be ligated enzymatically. The process can be repeated, adding at least one codon in each 
cycle. The process can be iteratively repeated to produce a polynucleotide of any length. 
The synthesis can be started simultaneously at multiple points within the gene. Synthesized 
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partial genes pan be then released from the solid support, e.g., by a second set of restriction 
sites in the flanking regions and linked to form a desired full-length product, e.g., a 
polypeptide coding sequence, a transcript with or without 5' and 3' non-coding regions, a 
transcriptional control region, a gene. 

In the methods of the invention, the same set of starter and elongation 
oligonucleotide fragments can be used for every synthesis. The methods of the invention of 
the invention can generate polynucleotides with very low error frequencies. The 
oligonucleotide building blocks, including the immobilized "starter" and the "elongation" 
oligonucleotides can be prepared from plasmid DNA as restriction fragments, or, they can be 
generated by nucleic acid amplification (e.g., PCR). 

An exemplary polynucleotide synthetic scheme of the invention uses a library 
of pre-made building blocks to generate any given DNA sequence. The library can include 
all possible di-codon combinations, at total of 4096 clones to be used with 61 "starter" linker 
oligonucleotide fragments. As described in Example 1, below, in one aspect, each di-codon 
containing oligonucleotide "block" is cloned, sequence verified, PCR amplified or prepped 
from a restriction digest, and pre-cut (pre-digested) with a Type-HS restriction endonuclease. 

Building genes from oligonucleotides using the methods and libraries of the 
invention can eliminate the requirement of a "parental" or a template DNA. Using a codon 
by codon addition strategy allows custom design of nucleic acid sequences, including genes, 
antisense coding sequences, polypeptide coding sequences and others without the need for a 
"parental" or a template DNA. The methods and libraries of the invention can be used to 
design synthetic nucleic acids such that codon usage towards one or more specific expression 
hosts is optimized. Restriction sites can be designed according to individual cloning needs. 
The methods and libraries of the invention can be used to design and incorporate custom 
transcriptional regulatory elements linked to a coding sequence to achieve a desired level of 
expression or a cell-specific expression pattern. The compositions and methods of the 
invention can be used in conjunction with any other method, including methods using 
"parental" or a template DNA. 

See Figure 1 for a summary of this exemplary iterative codon by codon gene 
building protocol. In one aspect, a target DNA sequence is synthesized on a solid support 
(e.g., a bead or a capillary). As noted in Figure 1, first a "starter" fragment containing at 
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least a first codon is immobilized to the support. The "starter" oligonucleotide can be 
immobilized by a "hook" already on the support, e.g., the bead. In the next step, an 
"elongation fragment" comprising a multicodon (at least two codons, or a dicodon) is added. 
In this example the first "elongation fragment" comprises the first two codons. However, in 
other aspects of the invention, the "starter" fragments can comprise at least one codon. The 
joined ends are ligated. The cycle is completed after cutting with a restriction enzyme to 
generate a 5' overhang. In this exemplary method, the restriction enzyme cuts in codon two 
such that the cycle adds one codon in each cycle. 

In another aspect, because palindromic sequences may result in self-ligation 
of the fragments the 5' overhangs can be filled in and converted to blunt ends using Klenow 
DNA polymerase to block them from annealing in later elongation cycles. 

The building block oligonucleotide libraries of the invention can be prepared 
in vectors, thus, the building block oligonucleotide libraries of the invention can comprise a 
cloning vehicle, such as a vector. In the preparation of a library of the invention the choice 
of the vector and host strain may be important that the vector not contain restriction sites 
used in the preparation of the "building blocks." A strain that produces unmodified DNA 
may need to be used because some of the class IIS restriction enzymes are sensitive to 
methylation. The "building blocks" can be prepared in a variety of ways, e.g., as restriction 
fragments, by high-fidelity PCR amplification, by synthetic chemistry. 

In one aspect, these methods are performed as an automated, high throughput 
system. Supporting software can be used, e.g., for archiving and/or retrieval of sequenced 
clones, identifying the necessary building blocks in an array of clones or in a library for a 
given nucleic acid sequence. Any software system can be used, e.g., variations of 
DNACARPENTER™ software, Diversa Corporation, San Diego, CA. Any robots system 
can be used for the automated, high throughput system. 

Definitions 

Unless defined otherwise, all technical and scientific terms used herein have 
the meaning commonly understood by a person skilled in the art to which this invention 
belongs. As used herein, the following terms have the meanings ascribed to them unless 
specified otherwise. 
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The terms "Type-ES enzyme" or "Type-ES restriction endonuclease" include 
all restriction endonucleases and all isochizomers having an asymmetric recognition 
sequence that cut at a fixed position outside of the recognition sequence at one strand or both 
strands, either 3' or 5' or on both sides of the recognition sequence. Type IIS enzymes can 
recognize asymmetric base sequences and cleave DNA at a specified position up to 20 or 
more base pairs outside of the recognition site. In one aspect, they can cleave a few 
nucleotides away from the recognition sequence (see, e.g., Bath (2001) Biol. Chem. Nov 29; 
epub). Exemplary restriction endonucleases that cut on both sides include Bcgl (see, e.g., 
Kong (1998) J. Mol. Biol. 279:823-32), BsaXI and BspCNI. Exemplary restriction 
endonucleases that generate a three base single-stranded overhang include Earl and Sapl. 
Exemplary restriction endonucleases that generate a two base single-stranded overhang 
include BseRI, Bsgl (see, e.g., Ariazi (1996) Biotechniques 20:446-448, 450-451) and Bpml. 
Exemplary restriction endonucleases that generate a one base single-stranded overhang 
include Bmrl; Ecil, HphI, MboII (see, e.g., Soundararajan (2001) J. Biol. Chem. Oct 17; 
epub) and MnlL Exemplary restriction endonucleases that cut only one strand ("nicking 
enzymes") include N.AlwI and N.BstNBI. Any Type IIS enzyme can be used in the methods 
of the invention, including, e.g., BspMI (see, e.g., Gormley (2001) J. Biol. Chem. Nov. 29; 
epub) and Bcefl (see, e.g., Venetianer (1988) Nucleic Acids Res. 16:3053-3060). 

"Earl" includes all Type-ES restriction endonucleases which recognize 5'- 
(jlQYTC -3* and all isochizomers and restriction endonucleases having the same recognition 
sequence and base cleaving pattern (isochizomers have the same the specificity of the 
prototype restriction endonuclease). Earl was first isolated from an Enterobacter aerogenes. 
See, e.g., Polisson (1988) Nucleic Acids Res. 16:9872. 

"Sapl" includes all Type-ES restriction endonucleases which recognize the 
non-palindromic 7-base recognition sequence (GCTCTTC) and all isochizomers and 
restriction endonucleases having the same recognition sequence and base-cleaving pattern. 
See, e.g., Xu (1998) Mol. Gen. Genet. 260:226-231. 

The term "saturation mutagenesis" or "GSSM" includes a method that uses 
degenerate oligonucleotide primers to introduce point mutations into a polynucleotide, as 
described in detail herein. 
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The term "optimized directed evolution system" or "optimized directed 
evolution" includes a method for reassembling fragments of related nucleic acid sequences, 
e.g., related genes, and explained in detail herein. 

The term "synthetic ligation reassembly" or "SLR" includes a method of 
5 ligating oligonucleotide fragments in a non-stochastic fashion, and explained in detail herein. 

The terms "nucleic acid" and "polynucleotide" as used herein include 
deoxyribonucleotides or ribonucleotides in either single- or double-stranded form. The terms 
encompass all nucleic acids, e.g., oligonucleotides, and modifications analogues of natural 
nucleotides, e.g., nucleic acids with modified internucleoside linkages. The terms also 

10 encompass nucleic-acid-like structures with synthetic backbones. Synthetic backbone 
analogues include, e.g., phosphodiester, phosphorothioate, phosphorodithioate, 
methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate, 3 f -thioacetal, 
methylene(methylimino), 3-N-carbamate, morpholino carbamate, and peptide nucleic acids 
(PNAs); see Oligonucleotides and Analogues, a Practical Approach, edited by F. Eckstein, IRL 

15 Press at Oxford University Press (1991); Antisense Strategies, Annals of the New York 

Academy of Sciences, Volume 600, Eds. Baserga and Denhardt (NYAS 1992); Milligan (1993) 
J. Med. Chem. 36:1923-1937; Antisense Research and Applications (1993, CRC Press). PNAs 
can contain non-ionic backbones, such as N-(2-aminoethyl) glycine units, see, e.g., U.S. Patent 
No. 5,871,902. Phosphorothioate linkages are described, e.g., in WO 97/03211; WO 96/39154; 

20 Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197. Other synthetic backbones include 

methyl-phosphonate linkages or alternating methylphosphonate and phosphodiester linkages 
(Strauss-Soukup (1997) Biochemistry 36:8692-8698), and benzylphosphonate linkages 
(Samstag (1996) Antisense Nucleic Acid Drug Dev 6:153-156). Modified internucleoside 
linkages that are resistant to nucleases are described, e.g., in U.S. Patent No. 5,817,781, The 

25 term nucleic acid and polynucleotide can be used interchangeably with the terms gene, cDNA, 
mRNA, probe and amplification product. 

Generating and Manipulating Nucleic Acids 

The invention provides libraries of nucleic acids (oligonucleotides and 
polynucleotides) and methods of making and using these libraries. The invention also 
30 provides methods for making nucleic acids using a codon by codon building technique and 
methods for further manipulation of these nucleic acids, including cloning, sequencing and 
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expressing them. Nucleic acids, including individual bases, codons, oligos, and the like, 
needed to make and use the invention can be isolated from a cell, recombinantly generated or 
made synthetically. Sequences can be isolated by, e.g., cloning and expression of cDNA 
libraries, amplification of message or genomic DNA by PCR, and the like. The invention can 
be practiced in conjunction with any method or protocol or device known in the art, which 
are well described in the scientific and patent literature. 

General Techniques 

Nucleic acids (including individual bases, codons, oligos, and the like) used to 
practice this invention, whether RNA, cDNA, genomic DNA, vectors, viruses or hybrids 
thereof, maybe isolated from a variety of sources, genetically engineered, amplified, and/or 
expressed/ generated recombinantly. Recombinant polypeptides generated from these 
nucleic acids can be individually isolated or cloned and tested for a desired activity. Any 
recombinant expression system can be used, including bacterial, mammalian, yeast, insect or 
plant cell expression systems. 

Alternatively, these nucleic acids (including individual bases, codons, oligos, 
and the like) can be synthesized in vitro by well-known chemical synthesis techniques, as 
described in, e.g., Adams (1983) J. Am. Chem. Soc. 105:661; Belousov (1997) Nucleic 
Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. Biol. Med. 19:373-380; Blommers 
(1994) Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 68:90; Brown (1979) 
Meth. Enzymol. 68:109; Beaucage (1981) Terra. Lett. 22:1859; U.S. Patent No. 4,458,066. 

Techniques for the manipulation of nucleic acids, such as, e.g., subcloning, 
ligations, labeling probes (e.g., random-primer labeling using Klenow polymerase, nick 
translation, amplification), sequencing, hybridization and the like are well described in the 
scientific and patent literature, see, e.g., Sambrook, ed., Molecular Cloning: a 
Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989); 
Current Protocols in Molecular Biology, Ausubel, ed. John Wiley & Sons, Inc., New 
York (1997); Laboratory Techniques in Biochemistry and Molecular Biology: 
Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, 
Tijssen, ed. Elsevier, N.Y. (1993). 

Nucleic acids, oligonucleotides, vectors, capsids, polypeptides, and the like 
can be analyzed and quantified by any of a number of general means well known to those of 
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skill in the art* These include, e.g., analytical biochemical methods such as NMR, 
spectrophotometry, radiography, electrophoresis, capillary electrophoresis, high performance 
liquid chromatography (HPLC), thin layer chromatography (TLC), and hyperdiffusion 
chromatography, various immunological methods, e.g. fluid or gel precipitin reactions, 
immunodiffusion, immuno-electrophoresis, radioimmunoassays (RIAs), enzyme-linked 
immunosorbent assays (ELISAs), immuno-fluorescent assays, Southern analysis, Northern 
analysis, dot-blot analysis, gel electrophoresis (e.g., SDS-PAGE), nucleic acid or target or 
signal amplification methods, radiolabeling, scintillation counting, and affinity 
chromatography. 

A variety of enzymes and buffers can be used in the methods and systems of 
the invention, including restriction endonucleases (e.g., type IIS endonucleases), DNA 
ligases, Klenow DNA polymerases and the like. Buffers and reactions conditions, e.g., 
incubation times, temperatures, amount of enzyme and nucleic acid used for each step, can 
be optimized for each step by routine methods. 

Amplification of Nucleic Acids 

In practicing the methods of the invention, nucleic acids and oligonucleotides 
can be manipulated, sequenced, cloned, reproduced and the like by amplification reactions. 
Amplification reactions can be used to splice together nucleic acids or oligonucleotides or 
clone them into vectors. Amplification reactions can also be used to quantify the amount of 
nucleic acid in a sample, label the nucleic acid (e.g., to apply it to an array or a blot), detect 
the nucleic acid, or quantify the amount of a specific nucleic acid in a sample. The skilled 
artisan can select and design suitable oligonucleotide amplification primers. Amplification 
methods are also well known in the art, and include, e.g., polymerase chain reaction, PCR 
(see, e.g., PCR PROTOCOLS, A GUIDE TO METHODS AND APPLICATIONS, ed. Innis, 
Academic Press, N.Y. (1990) and PCR STRATEGIES (1995), ed. Innis, Academic Press, 
Inc., N.Y., ligase chain reaction (LCR) (see, e.g., Wu (1989) Genomics 4:560; Landegren 
(1988) Science 241:1077; Barringer (1990) Gene 89:117); transcription amplification (see, 
e.g., Kwoh (1989) Proc. Natl. Acad. Sci. USA 86:1173); and, self-sustained sequence 
replication (see, e.g., Guatelli (1990) Proc. Natl. Acad. Sci. USA 87:1874); Q Beta replicase 
amplification (see, e.g., Smith (1997) J. Clin. Microbiol. 35:1477-1491), automated Q-beta 
replicase amplification assay (see, e.g., Burg (1996) MoL Cell. Probes 10:257-271) and other 
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RNA polymerase mediated techniques (e.g., NASBA, Cangene, Mississauga, Ontario); see 
also Berger (1987) Methods EnzymoL 152:307-316; Sambrook; Ausubel; U.S. Patent Nos. 
4,683,195 and 4,683,202; Sooknanan(1995) Biotechnology 13:563-564. 

Substrate surfaces 

5 The invention provides a method for building a polynucleotide by iterative 

assembly of multicodon, e.g., dicodon, building blocks comprising providing a substrate 
surface and immobilizing an oligonucleotide to the substrate surface. Any substrate surface 
can be used to practice the invention. For example, substrate surfaces can be of rigid, semi- 
rigid or flexible material. Substrate surfaces can be flat or planar, be shaped as wells, raised 

10 regions, etched trenches, pores, beads, filaments, or the like. Substrate surfaces can be of any 
material upon which a "capture probe" can be directly or indirectly bound. For example, 
suitable materials can include paper, glass (see, e.g., U.S. Patent No. 5,843,767), ceramics, 
quartz or other crystalline substrates (e.g. gallium arsenide), metals, metalloids, 
polacryloylmorpholide, various plastics and plastic copolymers, Nylon™, Teflon™, 

1 5 polyethylene, polypropylene, poly(4-methylbutene), polystyrene, polystyrene/ latex, 
polymethacrylate, poly(ethylene terephthalate), rayon, nylon, poly(vinyl butyrate), 
polyvinylidene difluoride (PVDF) (see, e.g., U.S. Patent No. 6,024,872), silicones (see, e.g., 
U.S. Patent No. 6,096,817), polyformaldehyde (see, e.g., U.S. Patent Nos. 4,355,153; 
4,652,613), cellulose (see, e.g., U.S. Patent No. 5,068,269), cellulose acetate (see, e.g., U.S. 

20 Patent No. 6,048,457), nitrocellulose, various membranes and gels (e.g., silica aerogels, see, 
e.g., U.S. Patent No. 5,795,557), paramagnetic or superparamagnetic microparticles (see, 
e.g., U.S. Patent No. 5,939,261) and the like. Silane (e.g., mono- and dihydroxyalkylsilanes, 
aminoalkyltrialkoxysilanes, 3-aminopropyl-triethoxysilane, 3-aminopropyltrimethoxysilane) 
can provide a hydroxyl functional group for reaction with an amine functional group. 

25 In one aspect, the invention provides a set of beads, e.g., magnetic beads 

(including, e.g., paramagnetic or superparamagnetic microparticles), comprising 61 "starter" 
oligonucleotides, one bead for each possible amino acid coding triplet. In another aspect, the 
invention provides a system comprising these 61 "starter" oligonucleotides and 4 6 or 1096 
possible hexameric dicodon oligonucleotides. As discussed above, these dicodon 

30 oligonucleotides are "embedded" in, or flanked by, a framework of endonuclease recognition 
sites, e.g., class IIS restriction sites. The 61 "starter" oligonucleotides can be immobilized 
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onto modalities other than beads, e.g., wells, strands, capillary tubes (see below, e.g., " 
capillary arrays, such as the GIGAMATREX™), troughs and the like. 

Capillary Arrays 

Capillary arrays, such as the GIGAMATRDC™, Diversa Corporation, San 
5 Diego, CA, can be used as a substrate surface. Capillary arrays provide another system for 
immobilizing and building nucleic acids using the methods of the invention. Once 
constructed, the immobilized newly constructed polynucleotides can be screened and 
expressed within the capillary array. A plurality of capillaries can be formed into an arTay of 
adjacent capillaries, wherein each capillary comprises at least one wall defining a lumen for 

10 retaining an oligonucleotide. The apparatus can further include interstitial material disposed 
between adjacent capillaries in the array, and one or more reference indicia formed within of 
the interstitial material. A capillary for screening a sample, wherein the capillary is adapted 
for being bound in an array of capillaries, can include a first wall defining a lumen for 
retaining the sample, and a second wall formed of a filtering material, for filtering excitation 

15 energy provided to the lumen to excite the sample. See, e.g., WO0138583. 

For example, a nucleic acid, e.g., a codon-comprising library member, can be 
introduced into a first component into at least a portion of a capillary of a capillary array. 
Each capillary of the capillary array can comprise at least one wall defining a lumen for 
retaining the first component, and introducing an air bubble into the capillary behind the first 

20 component. A second component (e.g., a different buffer, an endonuclease enzyme, a codon- 
comprising library member) can be introduced into the capillary, wherein the second 
component is separated from the first component by the air bubble. A sample (e.g., 
comprising a codon-comprising library member) can be introduced as a first liquid labeled 
with a detectable particle into a capillary of a capillary array, wherein each capillary of the 

25 capillary array comprises at least one wall defining a lumen for retaining the first liquid and 
the detectable particle, and wherein the at least one wall is coated with a binding material for 
binding the detectable particle to the at least one wall. The method can further include 
removing the first liquid from the capillary tube, wherein the bound detectable particle is 
maintained within the capillary, and introducing a second liquid into the capillary tube. 

30 The capillary array can include a plurality of individual capillaries comprising at 

least one outer wall defining a lumen. The outer wall of the capillary can be one or more walls 
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fused together. Similarly, the wall can define a lumen that is cylindrical, square, hexagonal or 
any other geometric shape so long as the walls form a lumen for retention of a liquid or sample. 
The capillaries of the capillary array can be held together in close proximity to form a planar 
structure. The capillaries can be bound together, by being fused (e.g., where the capillaries are 
made of glass), glued, bonded, or clamped side-by-side. The capillary array can be formed of 
any number of individual capillaries, for example, a range from 100 to 4,000,000 capillaries. A 
capillary array can form a microliter plate having about 100,000 or more individual capillaries 
bound together. 

Modification of Nucleic Acids 

The nucleic acids generated by the methods of the invention can be altered by 
any means, including saturation mutagenesis, an optimized directed evolution system, 
synthetic ligation reassembly, or a combination thereof, as described herein. Random or 
stochastic methods, or, non-stochastic, or "directed evolution," methods can be used. 
Further, as discussed above, the nucleic acids generated by the methods of the invention can 
be purified by the methods described herein, e.g., the methods for purifying double-stranded 
polynucleotides lacking base pair mismatches, insertion/deletion loops and/or a nucleotide 
gap or gaps as described herein. The nucleic acids generated by the methods of the invention 
can be altered by a method comprising gene site saturated mutagenesis (GSSM), error-prone 
PGR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual PCR 
mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, 
exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, synthetic 
ligation reassembly (SLR) and a combination thereof. The nucleic acids generated by the 
methods of the invention can be altered by a method comprising recombination, recursive 
sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing 
template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, 
repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, 
deletion mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, 
artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and 
a combination thereof. 

Methods for random mutation of genes are well known in the art, see, e.g., 
U.S. Patent No. 5,830,696. Mutagens include, e.g., ultraviolet light or gamma irradiation, or 
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a chemical mutagen, e.g., mitomycin, nitrous acid, photoactivated psoralens, alone or in 
combination, to induce DNA breaks amenable to repair by recombination. Other chemical 
mutagens includej for example, sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or 
formic acid. Other mutagens are analogues of nucleotide precursors, e.g., nitrosoguanidine, 
5-bromouracil, 2-aminopurine, or acridine. These agents can be added to a PCR reaction in 
place of the nucleotide precursor thereby mutating the sequence. Intercalating agents such as 
proflavine, acriflavine, quinacrine and the like can also be used. 

Techniques in molecular biology can be used, e.g., random PCR mutagenesis, 
see, e.g., Rice (1992) Proc. Natl. Acad. Sci. USA 89:5467-5471; or, combinatorial multiple 
cassette mutagenesis, see, e.g., Crameri (1995) Biotechniques 18:194-196. Alternatively, 
nucleic acids, e.g., genes, can be reassembled after random, or "stochastic," fragmentation, 
see, e.g., U.S. Patent Nos. 6,291,242; 6,287,862; 6,287,861; 5,955,358; 5,830,721; 
5,824,514; 5,811,238; 5,605,793. Polypeptides encoded by isolated and/or modified nucleic 
acids can be screened for an activity before their reinsertion into the cell by, e.g., using a 
capillary array platform. See, e.g., U.S. Patent Nos. 6,280,926; 5,939,250. 

Saturation mutagenesis, or, GSSM 

In one aspect of the invention, non-stochastic gene modification, a "directed 
evolution process," can be used to modify nucleic acids generated by the methods of the 
invention. Variations of this method have been termed "gene site-saturation mutagenesis," 
"site-saturation mutagenesis," "saturation mutagenesis" or simply "GSSM." It can be used in 
combination with other mutagenization processes. See, e.g., U.S. Patent Nos. 6,171,820; 
6,238,884. In one aspect, GSSM comprises providing a template polynucleotide and a 
plurality of oligonucleotides, wherein each oligonucleotide comprises a sequence 
homologous to the template polynucleotide, thereby targeting a specific sequence of the 
template polynucleotide, and a sequence that is a variant of the homologous gene; generating 
progeny polynucleotides comprising non-stochastic sequence variations by replicating the 
template polynucleotide with the oligonucleotides, thereby generating polynucleotides 
comprising homologous gene sequence variations. 

In another aspect, site-saturation mutagenesis can be used together with 
another stochastic or non-stochastic means to vary sequence, e.g., synthetic ligation 
reassembly (see below), shuffling, chimerization, recombination and other mutagenizing 
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processes andmutagenizing agents. This invention provides for the use of any mutagenizing 
process(es), including saturation mutagenesis, in an iterative manner. 

Synthetic Ligation Reassembly (SLR) 

Another non-stochastic gene modification, a "directed evolution process," that 
5 can be can be used to modify nucleic acids generated by the methods of the invention has 
been termed "synthetic ligation reassembly " or simply "SLR." SLR is a method of ligating 
oligonucleotide fragments together non-stochastically. This method differs from stochastic 
oligonucleotide shuffling in that the nucleic acid building blocks are not shuffled, 
concatenated or chimerized randomly, but rather are assembled non-stochastically. See, e.g., 

10 U.S. Patent Application Serial No. (USSN) 09/332,835 entitled "Synthetic Ligation 

Reassembly in Directed Evolution" and filed on June 14, 1999 ("USSN 09/332,835"). In one 
aspect, SLR comprises the following steps: (a) providing a template polynucleotide, 
wherein the template polynucleotide comprises sequence encoding a homologous gene; (b) 
providing a plurality of building block polynucleotides, wherein the building block 

1 5 polynucleotides are designed to cross-over reassemble with the template polynucleotide at a 
predetermined sequence, and a building block polynucleotide comprises a sequence that is a 
variant of "the homologous gene and a sequence homologous to the template polynucleotide 
flanking the variant sequence; (c) combining a building block polynucleotide with a 
template polynucleotide such that the building block polynucleotide cross-over reassembles 

20 with the template polynucleotide to generate polynucleotides comprising homologous gene 
sequence variations. 

SLR does not depend on the presence of high levels of homology between 
polynucleotides to be rearranged. Thus, this method can be used to non-stochastically 
generate libraries (or sets) of progeny molecules comprised of over 10 100 different chimeras. 

25 SLR can be used to generate libraries comprised of over 10 1000 different progeny chimeras. 
Thus, aspects of the present invention include non-stochastic methods of producing a set of 
finalized chimeric nucleic acid molecule shaving an overall assembly order that is chosen by 
design. This method includes the steps of generating by design a plurality of specific nucleic 
acid building blocks having serviceable mutually compatible ligatable ends, and assembling 

30 these nucleic acid building blocks, such that a designed overall assembly order is achieved. 
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Optimized Directed Evolution System 

Nucleic acids generated by the methods of the invention can also be modified 
by a method comprising an optimized directed evolution system. Optimized directed 
evolution is directed to the use of repeated cycles of reductive reassortment, recombination 
and selection that allow for the directed molecular evolution of nucleic acids through 
recombination. Optimized directed evolution allows generation of a large population of 
evolved chimeric sequences, wherein the generated population is significantly enriched for 
sequences that have a predetermined number of crossover events. A crossover event is a 
point in a chimeric sequence where a shift in sequence occurs from one parental variant to 
another parental variant. Such a point is normally at the juncture of where oligonucleotides 
from two parents are ligated together to form a single sequence. This method allows 
calculation of the correct concentrations of oligonucleotide sequences so that the final 
chimeric population of sequences is enriched for the chosen number of crossover events. 
This provides more control over choosing chimeric variants having a predetermined number 
of crossover events. 

In addition, this method provides a convenient means for exploring a 
tremendous amount of the possible protein variant space. By using optimized directed 
evolution system, a population of nucleic acid molecules can be enriched for those variants 
that have a particular number of crossover events. One method for creating a chimeric 
progeny polynucleotide sequence is to create oligonucleotides corresponding to fragments or 
portions of each parental sequence. Each oligonucleotide can include a unique region of 
overlap so that mixing the oligonucleotides together results in a new variant that has each 
oligonucleotide fragment assembled in the correct order. Additional information can also be 
found in WO0077262; WO0058517; WO0046344. 

CHIMERIC ANTIGEN BINDING MOLECULES AND METHODS FOR MAKING AND 
USING THEM 

The invention provides novel chimeric antigen binding polypeptides, nucleic 
acids encoding them and methods for making and using them. This invention also provides 
methods for further modifying these chimeric antigen binding polypeptides by altering the 
nucleic acids that encode them by saturation mutagenesis, an optimized directed evolution 
system, synthetic ligation reassembly, or a combination thereof. These modifications can 
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focus on such.as antigen binding sites or specific domains or fragments of antibodies, e.g., 
variable or heavy domains, Fab or Fc domains or CDRs. 

The invention also provides libraries of chimeric antigen binding polypeptides 
encoded by the nucleic acid libraries of the invention and generated by the methods of the 
5 invention. These antigen binding polypeptides can be analyzed using any liquid or solid state 
screening method, e.g., phage display, ribosome display, using capillary array platforms, e.g., 
GIGAMATRIX™, and the like. 

The chimeric antigen binding polypeptides generated by the methods of the 
invention can be used in vitro, e.g., to isolate, measure amounts of, or identify antigens or in 

10 vivo, e.g., to treat or diagnose various diseases and conditions, or to modulate, stimulate or 
attenuate an immune response. The antigen binding polypeptides of the invention can be 
manipulated to be catalytic antibodies, see, e.g., U.S. Patent Nos. 6,326,179; 5,439,812; 
5,302,516; 5,187,086; 5,126,258. 

This invention also pertains to the field of vaccines. The libraries and 

15 methods of the invention provide manipulated antigen binding polypeptides, including 
polypeptide antibodies and genetic vaccines comprising nucleic acids. Specific antigen 
binding polypeptides can be selected for optimization by the methods of the invention for a 
particular vaccination goal. Antibodies can be designed for administration to generate 
passive immunity. Nucleic acids encoding these antigen binding polypeptides can be used as 

20 genetic vaccines. In one aspect, this invention provides methods for improving the efficacy 
of genetic vaccines by providing antigen binding polypeptides that facilitate targeting of a 
genetic vaccine to a particular tissue or cell type of interest. 

This invention pertains to the field biologic therapeutics by providing 
polypeptides comprising antigen binding sites, such as antibodies, with modified (e.g., 

25 increased or decreased) affinity for antigen. For example, the methods of the invention 
provide antibodies of altered or enhanced affinities for an antigen for use, e.g., in 
immunotherapeutics or diagnostics. The antibodies generated by the methods of the 
invention can be administered therapeutically to slow the growth of or kill cells, such as 
cancer cells, or, to stimulate cell division, e.g., for enhancing an immune response or for 

30 tissue regeneration, or, to alter any biological mechanism or response. For example, 
administration of antibodies that bind to immune effector or regulatory cells, or to 
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lymphokines.or .cytokines, can alter, e.g., upregulate, stimulate or attenuate, a humoral or a 
cellular immune response. This invention also can be used to develop efficient immune 
responses against a broad range of antigens. 

This invention pertains to the field of modulation of immune responses by 
providing chimeric antigen binding polypeptides specific for molecules that are involved in 
the stimulation and regulation of the immune response, including, e.g., Fc receptors, surface 
expressed (membrane bound) immunoglobulins, T cell receptors or Class I and Class II 
major histocompatibility (MHC) molecules. For example, by modulating expression of one 
or more these molecules the methods of the invention can modulate autoreactive TCR 
reactions, generate an abated or attenuated immune response to a self antigen or generate an 
enhanced immune response, e.g., to a pathogen. 

This invention also relates to the field of protein engineering. The invention 
uses directed evolution methods for modifying polynucleotides encoding the chimeric 
antigen binding polypeptides of the invention. Methods of mutagenesis are used to generate 
novel polynucleotides encoding chimeric antigen binding polypeptides that are altered, or 
"improved." These methods include non-stochastic polynucleotide chimerization and non- 
stochastic site-directed point mutagenesis. 

In one aspect, this invention relates to a method of generating a progeny 
library, or set, of chimeric antigen binding polynucleotide(s) by means that are synthetic and 
non-stochastic. The design of the progeny antigen binding polynucleotide^) is derived by 
analysis of a parental set of antigen binding polynucleotides and/or of the polypeptides 
correspondingly encoded by the parental polynucleotides. In another aspect, this invention 
relates to a method of performing site-directed mutagenesis using means that are exhaustive, 
systematic, and non-stochastic. 

This invention also includes selecting from among a generated set of progeny 
chimeric antigen binding molecules a subset comprised of particularly desirable species, 
including by a process termed end-selection, which subset may then be screened further. 
This invention also includes screening a set of antigen binding polynucleotides. The antigen 
binding polypeptides can be re-designed to have a usefid property, such as having an 
increased affinity (e.g., "affinity enrichment") or decreased affinity for an antigen, or gaining 
or changing its ability to act as an enzyme. 
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The methods of the invention provide for "affinity enrichment" of a chimeric 
antibody or an antigen binding site. Antibody constant regions (e.g., Fc domains) can also be 
"affinity enriched" for their ability to specifically bind to an Fc receptor or a complement 
polypeptide. Very large sets, or libraries, of variant antibodies, including, e.g., CDRs, Fabs, 
Fes, and single-chain antibodies, can be generated and screened for binding to ligand (e.g., 
antigen, complement, receptor, and the like). In one aspect, the variant polynucleotide is 
isolated and further manipulated by a method described herein, e.g., shuffled to recombine 
combinatorially the amino acid sequence of the selected polypeptides, peptide(s) or 
predetermined portions thereof. Thus, antibodies, antigen binding sites, Fc domains, and the 
like can be generated having a desired binding affinity for a molecule. The peptide or 
antibody can then be synthesized in bulk by conventional means for any suitable use (e.g., as 
a therapeutic pharmaceutical, a diagnostic agent, or as an in vitro reagent). 

DEFINITIONS 

Unless defined otherwise, all technical and scientific terms used herein have 
the meaning commonly understood by a person skilled in the art to which this invention 
belongs. As used herein, the following terms have the meanings ascribed to them unless 
specified otherwise. 

The term "saturation mutagenesis" or "GSSM" includes a method that uses 
degenerate oligonucleotide primers to introduce point mutations into a polynucleotide, as 
described in detail, below. In one aspect, the methods of the invention further comprise non- 
stochastic modification of all or a part of the sequence of a chimeric antibody coding 
sequence of the invention by "saturation mutagenesis" or "GSSM," 

The term "optimized directed evolution system" or "optimized directed 
evolution" includes a method for reassembling fragments of related nucleic acid sequences, 
e.g., related genes, and explained in detail, below. In one aspect, the methods of the 
invention further comprise non-stochastic modification of all or a part of the sequence of a 
chimeric antibody coding sequence of the invention by "optimized directed evolution 
system." 

The term "synthetic ligation reassembly" or "SLR" includes a method of 
ligating oligonucleotide fragments in a non-stochastic fashion, and explained in detail, below. 
In one aspect, the methods of the invention further comprise non-stochastic modification of 
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all or a part of the sequence of a chimeric antibody coding sequence of the invention by 
"synthetic ligation reassembly" or "SLR." 

The term "antibody" includes a peptide or polypeptide derived from, modeled 
after or substantially encoded by an immunoglobulin gene or immunoglobulin genes, or 
fragments thereof, capable of specifically binding an antigen or epitope, see, e.g. 
Fundamental Immunology, Third Edition, W.E. Paul, ed., Raven Press, N.Y. (1993); Wilson 
(1994) J. Immunol. Methods 175:267-73; Yarmush (1992) J. Biochem. Biophys. Methods 
25:85-97. The term antibody includes antigen-binding portions, i.e., "antigen binding sites," 
(e.g., fragments, subsequences, complementarity determining regions (CDRs)) that retain 
capacity to bind antigen, including (i) a Fab fragment, a monovalent fragment consisting of 
the VL, VH, CL and CHI domains; (ii) a F(ab')2 fragment, a bivalent fragment comprising 
two Fab fragments linked by a disulfide bridge at the hinge region; (iii) a Fd fragment 
consisting of the VH and CHI domains; (iv) a Fv fragment consisting of the VL and VH 
domains of a single arm of an antibody, (v) a dAb fragment (Ward et al., (1989) Nature 
341:544-546), which consists of a VH domain; and (vi) an isolated complementarity 
determining region (CDR). Single chain antibodies are also included by reference in the 
term "antibody." 

Generating and Manipulating Nucleic Acids 

The invention provides libraries of chimeric nucleic acids encoding a plurality 

of chimeric antigen binding polypeptides and methods for making these libraries. Making 

these libraries comprises providing nucleic acids encoding lambda light chain variable region 

polypeptide domains (VX), kappa light chain variable region polypeptide domains (Vk), J 

region polypeptide domains (VJ), lambda light chain constant region polypeptide domains 

(CX), kappa light chain constant region polypeptide domains (Ck), antibody heavy chain 

variable region polypeptide domains (VH), D region polypeptide domains (VD), J region 

polypeptide domains (VJ) and heavy chain constant region polypeptide domains (CH). 

These and other nucleic acids needed to make and use the invention can be 

* 

isolated from a cell, recombinant^ generated or made synthetically. The sequences can be 
isolated by, e.g., cloning and expression of cDNA libraries, amplification of message or 
genomic DNA by PCR, and the like. In practicing the methods of the invention, homologous 
genes can be modified by manipulating a template nucleic acid, as described herein. The 
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invention cauie practiced in conjunction with any method or protocol or device known in 
the art, which are well described in the scientific and patent literature. 

General Techniques 

The nucleic acids used to practice this invention, whether RNA, cDNA, 
genomic DNA, vectors, viruses or hybrids thereof, may be isolated from a variety of sources, 
genetically engineered, amplified, and/or expressed/ generated recombinantly. Recombinant 
polypeptides generated from these nucleic acids can be individually isolated or cloned and 
tested for a desired activity. Any recombinant expression system can be used, including 
bacterial, mammalian, yeast, insect or plant cell expression systems. 

Alternatively, these nucleic acids can be synthesized in vitro by well-known 
chemical synthesis techniques, as described in, e.g., Adams (1983) J. Am. Chem. Soc. 
105:661; Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. 
Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth. 
Enzymol. 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett. 
22:1859; U.S. Patent No. 4,458,066. 

Techniques for the manipulation of nucleic acids, such as, e.g., subcloning, 
ligations, labeling probes (e.g., random-primer labeling using Klenow polymerase, nick 
translation, amplification), sequencing, hybridization and the like are well described in the 
scientific and patent literature, see, e.g., Sambrook, ed., Molecular Cloning: a 
Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989); 
Current Protocols in Molecular Biology, Ausubel, ed. John Wiley & Sons, Inc., New 
York (1997); Laboratory Techniques in Biochemistry and Molecular Biology: 
Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, 
Tijssen, ed. Elsevier, N.Y. (1993). 

Nucleic acids, vectors, capsids, polypeptides, and the like can be analyzed and 
quantified by any of a number of general means well known to those of skill in the art. These 
include, e.g., analytical biochemical methods such as NMR, spectrophotometry, radiography, 
electrophoresis, capillary electrophoresis, high performance liquid chromatography (HPLC), 
thin layer chromatography (TLC), and hyperdiffusion chromatography, various 
immunological methods, e.g. fluid or gel precipitin reactions, immunodiffusion, Immuno- 
electrophoresis, radioimmunoassays (RIAs), enzyme-linked immunosorbent assays 
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(ELISAs), immuno-fluorescent assays, Southern analysis, Northern analysis, dot-blot 
analysis, gel electrophoresis (e.g., SDS-PAGE), nucleic acid or target or signal amplification 
methods, radiolabeling, scintillation counting, and affinity chromatography. 

Another useful means of obtaining and manipulating nucleic acids used to 
5 practice the methods of the invention is to clone from genomic samples, and, if desired, 

screen and re-clone inserts isolated or amplified from, e.g., genomic clones or cDNA clones. 
Sources of nucleic acid used in the methods of the invention include genomic or cDNA 
libraries contained in, e.g., mammalian artificial chromosomes (MACs), see, e.g., U.S. Patent 
Nos. 5,721,118; 6,025,155; human artificial chromosomes, see, e.g., Rosenfeld (1997) Nat. 
10 Genet 15:333-335; yeast artificial chromosomes (YAC); bacterial artificial chromosomes 

(BAC); PI artificial chromosomes, see, e.g., Woon (1998) Genomics 50:306-316; Pl-derived 
vectors (PACs), see, e.g., Kern (1997) Biotechniques 23:120-124; cosmids, recombinant 
viruses, phages or plasmids. 

Amplification of Nucleic Acids 

15 In practicing the methods of the invention, nucleic acids encoding lambda 

light chain variable region polypeptide domains (VX), kappa light chain variable region 
polypeptide domains (Vk), J region polypeptide domains (VJ), lambda light chain constant 
region polypeptide domains (CX) 9 kappa light chain constant region polypeptide domains 
(Ck), antibody heavy chain variable region polypeptide domains (VH), D region polypeptide 

20 domains (VD), J region polypeptide domains (VJ) and heavy chain constant region 

polypeptide domains (CH) can be generated and reproduced by, e.g., amplification reactions. 
Amplification reactions can also be used to join together these domains or splice the chimeric 
nucleic acids of the invention into vectors. Amplification reactions can also be used to 
quantify the amount of nucleic acid in a sample, label the nucleic acid (e.g., to apply it to an 

25 array or a blot), detect the nucleic acid, or quantify the amount of a specific nucleic acid in a 
sample. In one aspect of the invention, message isolated from a cell or a cDNA library are 
amplified. The skilled artisan can select and design suitable oligonucleotide amplification 
primers. Amplification methods are also well known in the art, and include, e.g., polymerase 
chain reaction, PCR (see, e.g., PCR PROTOCOLS, A GUIDE TO METHODS AND 

30 APPLICATIONS, ed. Innis, Academic Press, N.Y. (1990) and PCR STRATEGIES (1995), 
ed. Innis, Academic Press, Inc., N.Y., ligase chain reaction (LCR) (see, e.g., Wu (1989) 
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Genomics 4:560; Landegren (1988) Science 241:1077; Bairinger (1990) Gene 89:117); 
transcription amplification (see, e.g., Kwoh (1989) Proc. Natl. Acad. Sci. USA 86:1173); 
and, self-sustained sequence replication (see, e.g., Guatelli (1990) Proc. Natl. Acad. ScL 
USA 87:1874); Q Beta replicase amplification (see, e.g., Smith (1997) J. Clin. Microbiol. 
5 35:1477-1491), automated Q-beta replicase amplification assay (see, e.g., Burg (1996) MoL 
Cell. Probes 10:257-271) and other RNA polymerase mediated techniques {e.g., NASBA, 
Cangene, Mississauga, Ontario); see also Berger (1987) Methods Enzymol 152:307-316; 
Sambrook; Ausubel; U,S. Patent Nos. 4,683,195 and 4,683,202; Sooknanan (1995) 
Biotechnology 13:563-564. 

10 Immunoglobulin coding sequences 

The invention provides chimeric antigen binding polypeptides including 
lambda light chain variable region polypeptide domains (VX), kappa light chain variable 
region polypeptide domains (Vk), J region polypeptide domains (VJ), lambda light chain 
constant region polypeptide domains (CA), kappa light chain constant region polypeptide 

15 domains (Ck), antibody heavy chain variable region polypeptide domains (VH), D region 
polypeptide domains (VD), J region polypeptide domains (VJ) and heavy chain constant 
region polypeptide domains (CH) and the chimeric nucleic acids encoding them. These 
sequences can be modeled from, cloned or amplified from or directed isolated from any gene 
or message, including cDNA, sequence. 

20 Any cell can be used to as a source of antigen binding polypeptide coding 

sequence, including lymphocytes, such as B cells. Rearranged or activated B cells or plasma 
cells in the circulation, a lymph node or the spleen can be used. Any vertebrate can be a cell 
source. The repertoire of rearranged genes can be biased for a pre-determined binding 
specificity. For example, an animal can be immunized prior to isolating rearranged B cells or 

25 plasma cells. This generates a repertoire enriched for genetic material producing a ligand 
binding polypeptide of high affinity. 

Alternatively, nucleic acids encoding immunoglobulin sequences an be 
modeled after already characterized coding sequences, many of which are known and 
characterized in the art, as, e.g., Genbank sequences, or, for sequences or methods to isolate 

30 such sequences e.g., see U.S. Patent Nos. 6,319,690; 6,291,161; 6,258,529; 6,214,984; 

6,204,023; 6,068,840; 6,057,421; 5,891,438; 5,869,619; 5,861,499; 5,851,801; 5,821,123. 
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Modification of Nucleic Acids 

In one aspect of the methods of the invention, chimeric antigen binding 
polypeptide coding sequences are modified to alter the properties of the polypeptides they 
encode. The nucleic acids can be altered by any means, including saturation mutagenesis, an 
optimized directed evolution system, synthetic ligation reassembly, or a combination thereof, 
as described herein. Random or stochastic methods, or, non-stochastic, or "directed 
evolution," methods can be used. These nucleic acid modifying procedures can target 
specific domains, e.g., lambda light chain variable region polypeptide domains (VA,), kappa 
light chain variable region polypeptide domains (Vk), J region polypeptide domains (VJ), 
lambda light chain constant region polypeptide domains (CX), kappa light chain constant 
region polypeptide domains (Ck), antibody heavy chain variable region polypeptide domains 
(VH), D region polypeptide domains (VD), J region polypeptide domains (VJ) or heavy 
chain constant region polypeptide domains (CH). They can also specifically regions 
encoding target antigen binding sites or CDRs. 

Further, the nucleic acids encoding these antibodies can be purified by the 
methods described herein, e.g., the methods for purifying double-stranded polynucleotides 
lacking base pair mismatches, insertion/deletion loops and/or a nucleotide gap or gaps as 
described herein. 

The nucleic acids encoding the chimeric antigen binding polypeptide coding 
sequences can be modified by a method comprising gene site saturated mutagenesis (GSSM), 
error-prone PCR, shuffling, oligonucleotide-directed mutagenesis, assembly PCR, sexual 
PCR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble 
mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, 
synthetic ligation reassembly (SLR) and a combination thereof The nucleic acids generated 
by the methods of the invention can be altered by a method comprising recombination, 
recursive sequence recombination, phosphothioate-modified DNA mutagenesis, uracil- 
containing template mutagenesis, gapped duplex mutagenesis, point mismatch repair 
mutagenesis, repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic 
mutagenesis, deletion mutagenesis, restriction-selection mutagenesis, restriction-purification 
mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer 
creation and a combination thereof. 
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Methods for random mutation of genes are well known in the art, see, e.g., 
U.S. Patent No. 5,830,696. For example, mutagens can be used to randomly mutate a gene. 
Mutagens include, e.g., ultraviolet light or gamma irradiation, or a chemical mutagen, e.g., 
mitomycin, nitrous acid, photoactivated psoralens, alone or in combination, to induce DNA 
breaks amenable to repair by recombination. Other chemical mutagens include, for example, 
sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or formic acid. Other mutagens are 
analogues of nucleotide precursors, e.g., nitrosoguanidine, 5-bromouracil, 2-aminopurine, or 
acridine. These agents can be added to a PCR reaction in place of the nucleotide precursor 
thereby mutating the sequence. Intercalating agents such as proflavine, acriflavine, 
quinacrine and the like can also be used. 

Techniques in molecular biology can be used, e.g., random PCR mutagenesis, 
see, e.g., Rice (1992) Proc. Natl Acad. Sci. USA 89:5467-5471; or, combinatorial multiple 
cassette mutagenesis, see, e.g., Crameri (1995) Biotechniques 18:194-196. Alternatively, 
nucleic acids, e.g., genes, can be reassembled after random, or "stochastic," fragmentation, 
see, e.g., U.S. Patent Nos. 6,291,242; 6,287,862; 6,287,861; 5,955,358; 5,830,721; 
5,824,514; 5,811,238; 5,605,793. Polypeptides encoded by isolated and/or modified nucleic 
acids can be screened for an activity before their reinsertion into the cell by, e.g., using a 
capillary array platform. See, e.g., U.S. Patent Nos. 6,280,926; 5,939,250. 

Saturation mutagenesis, or, GSSM 

In one aspect of the invention, non-stochastic gene modification, a "directed 
evolution process," can be used to modify chimeric antigen binding polypeptide coding 
sequences. Variations of this method have been termed "gene site-saturation mutagenesis," 
"site-saturation mutagenesis," "saturation mutagenesis" or simply "GSSM." It can be used in 
combination with other mutagenization processes. See, e.g., U.S. Patent Nos. 6,171,820; 
6,238,884. In one aspect, GSSM comprises providing a template polynucleotide and a 
plurality of oligonucleotides, wherein each oligonucleotide comprises a sequence 
homologous to the template polynucleotide, thereby targeting a specific sequence of the 
template polynucleotide, and a sequence that is a variant of the homologous gene; generating 
progeny polynucleotides comprising non-stochastic sequence variations by replicating the 
template polynucleotide with the oligonucleotides, thereby generating polynucleotides 
comprising homologous gene sequence variations. 
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.. In one aspect, codon primers containing a degenerate N,N,G/T sequence are 
used to .introduce point mutations into a polynucleotide, so as to generate a set of progeny 
polypeptides in which a full range of single amino acid substitutions is represented at each 
amino acid position, e.g., an amino acid residue in an enzyme active site or ligand binding 
5 site targeted to be modified. These oligonucleotides can comprise a contiguous first 
homologous sequence, a degenerate N,N,G/T sequence, and, optionally, a second 
homologous sequence. The downstream progeny translational products from the use of such 
oligonucleotides include all possible amino acid changes at each amino acid site along the 
polypeptide, because the degeneracy of the N,N,G/T sequence includes codons for all 20 
10 amino acids. 

In one aspect, one such degenerate oligonucleotide (comprised of, e.g., one 
degenerate N,N,G/T cassette) is used for subjecting each original codon in a parental 
polynucleotide template to a full range of codon substitutions. In another aspect, at least two 
degenerate cassettes are used - either in the same oligonucleotide or not, for subjecting at 

15 least two original codons in a parental polynucleotide template to a full range of codon 
substitutions. For example, more than one N,N,G/T sequence can be contained in one 
oligonucleotide to introduce amino acid mutations at more than one site. This plurality of 
N,N,G/T sequences can be directly contiguous, or separated by one or more additional 
nucleotide sequence(s). In another aspect, oligonucleotides serviceable for introducing 

20 additions and deletions can be used either alone or in combination with the codons containing 
an N,N,G/T sequence, to introduce any combination or permutation of amino acid additions, 
deletions, and/or substitutions. 

In one aspect, simultaneous mutagenesis of two or more contiguous amino 
acid positions is done using an oligonucleotide that contains contiguous N,N,G/T triplets, i.e. 

25 a degenerate (N,N,G/T)n sequence. In another aspect, degenerate cassettes having less 

degeneracy than the N,N,G/T sequence are used. For example, it may be desirable in some 
instances to use (e.g. in an oligonucleotide) a degenerate triplet sequence comprised of only 
one N, where said N can be in the first second or third position of the triplet. Any other bases 
including any combinations and permutations thereof can be used in the remaining two 

30 positions of the triplet. Alternatively, it may be desirable in some instances to use (e.g. in an 
oligo) a degenerate N,N,N triplet sequence. 



64 



WO 03/060084 



PCT/US03/01189 



.- In one aspect, use of degenerate triplets (e.g., N,N,G/T triplets) allows for 
systematic and easy generation of a full range of possible natural amino acids (for a total of 
20 amino acids) into each and every amino acid position in a polypeptide (in alternative 
aspects, the methods also include generation of less than all possible substitutions per amino 
acid residue, or codon, position). For example, for a 100 amino acid polypeptide, 2000 
distinct species (i.e. 20 possible amino acids per position X 100 amino acid positions) can be 
generated. Through the use of an oligonucleotide or set of oligonucleotides containing a 
degenerate N,N,G/T triplet, 32 individual sequences can code for all 20 possible natural 
amino acids. Thus, in a reaction vessel in which a parental polynucleotide sequence is 
subjected to saturation mutagenesis using at least one such oligonucleotide, there are 
generated 32 distinct progeny polynucleotides encoding 20 distinct polypeptides. In contrast, 
the use of a non-degenerate oligonucleotide in site-directed mutagenesis leads to only one 
progeny polypeptide product per reaction vessel. Nondegenerate oligonucleotides can 
optionally be used in combination with degenerate primers disclosed; for example, 
nondegenerate oligonucleotides can be used to generate specific point mutations in a working 
polynucleotide. This provides one means to generate specific silent point mutations, point 
mutations leading to corresponding amino acid changes, and point mutations that cause the 
generation of stop codons and the corresponding expression of polypeptide fragments. 

In one aspect, each saturation mutagenesis reaction vessel contains 
polynucleotides encoding at least 20 progeny polypeptide molecules such that all 20 natural 
amino acids are represented at the one specific amino acid position corresponding to the 
codon position mutagenized in the parental polynucleotide (other aspects use less than all 20 
natural combinations). The 32-fold degenerate progeny polypeptides generated from each 
saturation mutagenesis reaction vessel can be subjected to clonal amplification (e.g. cloned 
into a suitable host, e.g., E. coli host, using, e.g., an expression vector) and subjected to 
expression screening. When an individual polypeptide is identified (e.g., by screening) to 
display a favorable change in property (when compared to the parental polypeptide, such as 
increased affinity or avidity to an antigen), it can be sequenced to identify the 
correspondingly favorable .amino acid substitution contained therein. 

In one aspect, upon mutagenizing each and every amino acid position in a 
parental polypeptide using saturation mutagenesis as disclosed herein, favorable amino acid 
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changes may.be identified at more than one amino acid position. One or more new progeny 
molecules can be generated that contain a combination of all or part of these favorable amino 
acid substitutions. For example, if 2 specific favorable amino acid changes are identified in 
each of 3 amino acid positions in a polypeptide, the permutations include 3 possibilities at 
each position (no change from the original amino acid, and each of two favorable changes) 
and 3 positions. Thus, there are 3 x 3 x 3 or 27 total possibilities, including 7 that were 
previously examined - 6 single point mutations (i.e. 2 at each of three positions) and no 
change at any position. 

In another aspect, site-saturation mutagenesis can be used together with 
another stochastic or non-stochastic means to vary sequence, e.g., synthetic ligation 
reassembly (see below), shuffling, chimerization, recombination and other mutagenizing 
processes and mutagenizing agents. This invention provides for the use of any mutagenizing 
process(es), including saturation mutagenesis, in an iterative manner. 

Synthetic Ligation Reassembly (SLR) 

Another non-stochastic gene modification, a "directed evolution process,'* that 
can be can be used to modify a chimeric antigen binding polypeptide coding sequence has 
been termed "synthetic ligation reassembly," or simply "SLR." SLR is a method of ligating 
oligonucleotide fragments together non-stochastically. This method differs from stochastic 
oligonucleotide shuffling in that the nucleic acid building blocks are not shuffled, 
concatenated or chimerized randomly, but rather are assembled non-stochastically. See, e.g., 
U.S. Patent Application Serial No. (USSN) 09/332,835 entitled "Synthetic Ligation 
Reassembly in Directed Evolution" and filed on June 14, 1999 ("USSN 09/332,835"). In one 
aspect, SLR comprises the following steps: (a) providing a template polynucleotide, 
wherein the template polynucleotide comprises sequence encoding a homologous gene; (b) 
providing a plurality of building block polynucleotides, wherein the building block 
polynucleotides are designed to cross-over reassemble with the template polynucleotide at a 
predetermined sequence, and a building block polynucleotide comprises a sequence that is a 
variant of the homologous gene and a sequence homologous to the template polynucleotide 
flanking the variant sequence; (c) combining a building block polynucleotide with a 
template polynucleotide such that the building block polynucleotide cross-over reassembles 
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with the template polynucleotide to generate polynucleotides comprising homologous gene 
sequence variations. 

SLR does not depend on the presence of high levels of homology between 
polynucleotides to be rearranged. Thus, this method can be used to non-stochastically 
5 generate libraries (or sets) of progeny molecules comprised of over 10 100 different chimeras. 
SLR can be used to generate libraries comprised of over 10 1000 different progeny chimeras. 
Thus, aspects of the present invention include non-stochastic methods of producing a set of 
finalized chimeric nucleic acid molecule shaving an overall assembly order that is chosen by 
design. This method includes the steps of generating by design a plurality of specific nucleic 

10 acid building blocks having serviceable mutually compatible ligatable ends, and assembling 
these nucleic acid building blocks, such that a designed overall assembly order is achieved. 

The mutually compatible ligatable ends of the nucleic acid building blocks to 
be assembled are considered to be "serviceable" for this type of ordered assembly if they 
enable the building blocks to be coupled in predetermined orders, Thus the overall assembly 

15 order in which the nucleic acid building blocks can be coupled is specified by the design of 
the ligatable ends. If more than one assembly step is to be used, then the overall assembly 
order in which the nucleic acid building blocks can be coupled is also specified by the 
sequential order of the assembly step(s). In one aspect, the annealed building pieces are 
treated with an enzyme, such as a ligase (e.g. T4 DNA ligase), to achieve covalent bonding of 

20 the building pieces. 

In one aspect, the design of the oligonucleotide building blocks is obtained by 
analyzing a set of progenitor nucleic acid sequence templates that serve as a basis for 
producing a progeny set of finalized chimeric polynucleotide molecules. These parental 
oligonucleotide templates thus serve as a source of sequence information that aids in the 

25 design of the nucleic acid building blocks that are to be mutagenized, e.g., chimerized or 
shuffled. 

In one aspect of this method, the sequences of a plurality of parental nucleic 
acid templates are aligned in order to select one or more demarcation points. The 
demarcation points can be located at an area of homology, and are comprised of one or more 
30 nucleotides. These demarcation points are preferably shared by at least two of the progenitor 
templates. The demarcation points can thereby be used to delineate the boundaries of 
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oligonucleotide.building blocks to be generated in order to rearrange the parental 
polynucleotides. The demarcation points identified and selected in the progenitor molecules 
serve as potential chimerization points in the assembly of the final chimeric progeny 
molecules. A demarcation point can be an area of homology (comprised of at least one 
homologous nucleotide base) shared by at least two parental polynucleotide sequences. 
Alternatively, a demarcation point can be an area of homology that is shared by at least half 
of the parental polynucleotide sequences, or, it can be an area of homology that is shared by 
at least two thirds of the parental polynucleotide sequences. Even more preferably a 
serviceable demarcation points is an area of homology that is shared by at least three fourths 
of the parental polynucleotide sequences, or, it can be shared by at almost all of the parental 
polynucleotide sequences. In one aspect, a demarcation point is an area of homology that is 
shared by all of the parental polynucleotide sequences. 

In one aspect, a ligation reassembly process is performed exhaustively in 
order to generate an exhaustive library of progeny chimeric polynucleotides. In other words, 
all possible ordered combinations of the nucleic acid building blocks are represented in the 
set of finalized chimeric nucleic acid molecules. At the same time, in another embodiment, 
the assembly order (i.e. the order of assembly of each building block in the 5' to 3 sequence 
of each finalized chimeric nucleic acid) in each combination is by design (or non-stochastic) 
as described above. Because of the non-stochastic nature of this invention, the possibility of 
unwanted side products is greatly reduced. 

In another aspect, the ligation reassembly method is performed systematically. 
For example, the method is performed in order to generate a systematically 
compartmentalized library of progeny molecules, with compartments that can be screened 
systematically, e.g. one by one. In other words this invention provides that, through the 
selective and judicious use of specific nucleic acid building blocks, coupled with the 
selective and judicious use of sequentially stepped assembly reactions, a design can be 
achieved where specific sets of progeny products are made in each of several reaction 
vessels. This allows a systematic examination and screening procedure to be performed. 
Thus, these methods allow a potentially very large number of progeny molecules to be 
examined systematically in smaller groups. 
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Because of its ability to perfonn chimerizations in a manner that is highly 
flexible yet exhaustive and systematic as well, particularly when there is a low level of 
homology among the progenitor molecules, these methods provide for the generation of a 
library (or set) comprised of a large number of progeny molecules. Because of the non- 
5 stochastic nature of the instant ligation reassembly invention, the progeny molecules 

generated preferably comprise a library of finalized chimeric nucleic acid molecules having 
an overall assembly order that is chosen by design. 

The saturation mutagenesis and optimized directed evolution methods also 
can be used to generate these amounts of different progeny molecular species. 

10 It is appreciated that the invention provides freedom of choice and control 

regarding the selection of demarcation points, the size and number of the nucleic acid 
building blocks, and the size and design of the couplings. It is appreciated, furthermore, that 
the requirement for intermolecular homology is highly relaxed for the operability of this 
invention. In fact, demarcation points can even be chosen in areas of little or no 

15 intermolecular homology. For example, because of codon wobble, i.e. the degeneracy of 

codons, nucleotide substitutions can be introduced into nucleic acid building blocks without 
altering the amino acid originally encoded in the corresponding progenitor template. 
Alternatively, a codon can be altered such that the coding for an originally amino acid is 
altered. This invention provides that such substitutions can be introduced into the nucleic 

20 acid building block in order to increase the incidence of intermolecularly homologous 

demarcation points arid thus to allow an increased number of couplings to be achieved among 
the building blocks, which in turn allows a greater number of progeny chimeric molecules to 
be generated. 

In another aspect, the synthetic nature of the step in which the building blocks 
25 are generated allows the design and introduction of nucleotides (e.g., one or more 

nucleotides, which maybe, for example, codons or introns or regulatory sequences) that can 
later be optionally removed in an in vitro process (e.g. by mutagenesis) or in an in vivo 
process (e.g. by utilizing the gene splicing ability of a host organism). It is appreciated that 
in many instances the introduction of these nucleotides may also be desirable for many other 
30 reasons in addition to the potential benefit of creating a serviceable demarcation point. 
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.- Thus, according to another aspect, a nucleic acid building block can be used to 
introduce an intron. Thus, functional introns may be introduced into a man-made gene 
manufactured according to the methods described herein. The artificially introduced 
intron(s) can be functional in a host cells for gene splicing much in the way that naturally- 
5 occurring introns serve functionally in gene splicing. 

Optimized Directed Evolution System 

In practicing the methods of the invention, chimeric nucleic acids encoding an 
antigen binding polypeptide can also be modified by a method comprising an optimized 
directed evolution system. Optimized directed evolution is directed to the use of repeated 
10 cycles of reductive reassortment, recombination and selection that allow for the directed 

molecular evolution of nucleic acids through recombination. Optimized directed evolution 
allows generation of a large population of evolved chimeric sequences, wherein the generated 
population is significantly enriched for sequences that have a predetermined number of 
crossover events. 

15 A crossover event is a point in a chimeric sequence where a shift in sequence 

occurs from one parental variant to another parental variant. Such a point is normally at the 
juncture of where oligonucleotides from two parents are ligated together to form a single 
sequence. This method allows calculation of the correct concentrations of oligonucleotide 
sequences so that the final chimeric population of sequences is enriched for the chosen 

20 number of crossover events. This provides more control over choosing chimeric variants 
having a predetermined number of crossover events. 

In addition, this method provides a convenient means for exploring a 
tremendous amount of the possible protein variant space in comparison to other systems. 
Previously, if one generated, for example, 10 13 chimeric molecules during a reaction, it 

25 would be extremely difficult to test such a high number of chimeric variants for a particular 
activity. Moreover, a significant portion of the progeny population would have a very high 
number of crossover events that resulted in proteins that were less likely to have increased 
levels of a particular activity. By using these methods, the population of chimerics molecules 
can be enriched for those variants that have a particular number of crossover events. Thus, 

30 although one can still generate 10 13 chimeric molecules during a reaction, each of the 

molecules chosen for further analysis most likely has, for example, only three crossover 
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events. Because the resulting progeny population can be skewed to have a predetermined 
number of crossover events, the boundaries on the functional variety between the chimeric 
molecules is reduced. This provides a more manageable number of variables when 
calculating which oligonucleotide from the original parental polynucleotides might be 
5 responsible for affecting a particular trait. 

One method for creating a chimeric progeny polynucleotide sequence is to 
create oligonucleotides corresponding to fragments or portions of each parental sequence. 
Each oligonucleotide preferably includes a unique region of overlap so that mixing the 
oligonucleotides together results in a new variant that has each oligonucleotide fragment 

10 assembled in the correct order. Additional information can also be found in USSN 

09/332,835. The number of oligonucleotides generated for each parental variant bears a 
relationship to the total number of resulting crossovers in the chimeric molecule that is 
ultimately created. For example, three parental nucleotide sequence variants might be 
provided to undergo a ligation reaction in order to find a chimeric variant having, for 

15 example, greater activity at high temperature. As one example, a set of 50 oligonucleotide 
sequences can be generated corresponding to each portions of each parental variant. 
Accordingly, during the ligation reassembly process there could be up to 50 crossover events 
within each of the chimeric sequences. The probability that each of the generated chimeric 
polynucleotides will contain oligonucleotides from each parental variant in alternating order 

20 is very low. If each oligonucleotide fragment is present in the ligation reaction in the same 
molar quantity it is likely that in some positions oligonucleotides from the same parental 
polynucleotide will ligate next to one another and thus not result in a crossover event. If the 
concentration of each oligonucleotide from each parent is kept constant during any ligation 
step in this example, there is a 1/3 chance (assuming 3 parents) that an oligonucleotide from 

25 the same parental variant will ligate within the chimeric sequence and produce no crossover. 

Accordingly, a probability density function (PDF) can be determined to 
predict the population of crossover events that are likely to occur during each step in a 
ligation reaction given a set number of parental variants, a number of oligonucleotides 
corresponding to each variant, and the concentrations of each variant during each step in the 

30 ligation reaction. The statistics and mathematics behind detemiining the PDF is described 

below. By utilizing these methods, one can calculate such a probability density function, and 
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thus enrich the chimeric progeny population for a predetermined number of crossover events 
resulting from a particular ligation reaction. Moreover, a target number of crossover events 
can be predetermined, and the system then programmed to calculate the starting quantities of 
each parental oligonucleotide during each step in the ligation reaction to result in a 
5 probability density function that centers on the predetermined number of crossover events. 

These methods are directed to the use of repeated cycles of reductive 
reassortment, recombination and selection that allow for the directed molecular evolution of 
a nucleic acid encoding an polypeptide through recombination. This system allows 
generation of a large population of evolved chimeric sequences, wherein the generated 

10 population is significantly enriched for sequences that have a predetermined number of 
crossover events. A crossover event is a point in a chimeric sequence where a shift in 
sequence occurs from one parental variant to another parental variant. Such a point is 
normally at the juncture of where oligonucleotides from two parents are ligated together to 
form a single sequence. The method allows calculation of the correct concentrations of 

15 oligonucleotide sequences so that the final chimeric population of sequences is enriched for 
the chosen number of crossover events. This provides more control over choosing chimeric 
variants having a predetermined number of crossover events. 

In addition, these methods provide a convenient means for exploring a 
tremendous amount of the possible protein variant space in comparison to other systems. By 

20 using the methods described herein, the population of chimerics molecules can be enriched 
for those variants that have a particular number of crossover events. Thus, although one can 
still generate 10 13 chimeric molecules during a reaction, each of the molecules chosen for 
further analysis most likely has, for example, only three crossover events. Because the 
resulting progeny population can be skewed to have a predetermined number of crossover 

25 events, the boundaries on the functional variety between the chimeric molecules is reduced. 
This provides a more manageable number of variables when calculating which 
oligonucleotide from the original parental polynucleotides might be responsible for affecting 
a particular trait. 

In one aspect, the method creates a chimeric progeny polynucleotide sequence 
30 by creating oligonucleotides corresponding to fragments or portions of each parental 
sequence. Each oligonucleotide preferably includes a unique region of overlap so that 
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mixing the oligonucleotides together results in a new variant that has each oligonucleotide 
fragment assembled in the correct order. See also USSN 09/332,835. 

The number of oligonucleotides generated for each parental variant bears a 
relationship to the total number of resulting crossovers in the chimeric molecule that is 
ultimately created. For example, three parental nucleotide sequence variants might be 
provided to undergo a ligation reaction in order to find a chimeric variant having, for 
example, greater activity at high temperature. As one example, a set of 50 oligonucleotide 
sequences can be generated corresponding to each portions of each parental variant. 
Accordingly, during the ligation reassembly process there could be up to 50 crossover events 
within each of the chimeric sequences. The probability that each of the generated chimeric 
polynucleotides will contain oligonucleotides from each parental variant in alternating order 
is very low. If each oligonucleotide fragment is present in the ligation reaction in the same 
molar quantity it is likely that in some positions oligonucleotides from the same parental 
polynucleotide will ligate next to one another and thus not result in a crossover event. If the 
concentration of each oligonucleotide from each parent is kept constant during any ligation 
step in this example, there is a 1/3 chance (assuming 3 parents) that a oligonucleotide from 
the same parental variant will ligate within the chimeric sequence and produce no crossover. 

Accordingly, a probability density function (PDF) can be determined to 
predict the population of crossover events that are likely to occur during each step in a 
ligation reaction given a set number of parental variants, a number of oligonucleotides 
corresponding to each variant, and the concentrations of each variant during each step in the 
ligation reaction. The statistics and mathematics behind determining the PDF is described 
below. One can calculate such a probability density function, and thus enrich the chimeric 
progeny population for a predetermined number of crossover events resulting from a 
particular ligation reaction. Moreover, a target number of crossover events can be 
predetermined, and the system then programmed to calculate the starting quantities of each 
parental oligonucleotide during each step in the ligation reaction to result in a probability 
density function that centers on the predetermined number of crossover events. 

Determining Crossover Events 

Embodiments of the invention include a system and software that receive a 
desired crossover probability density function (PDF), the number of parent genes to be 
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reassembled, -and the number of fragments in the reassembly as inputs. The output of this 
program is a "fragment PDF" that can be used to determine a recipe for producing 
reassembled genes, and the estimated crossover PDF of those genes. The processing 
described herein is preferably performed in MATLAB® (The Mathworks, Natick, 
5 Massachusetts) a programming language and development environment for technical 
computing. 

Iterative Processes 

In practicing the methods of the invention, the process can be iteratively 
repeated. For example a nucleic acid (or, the nucleic acid) responsible for an altered antigen 
10 binding property is identified, re-isolated, again modified, re-tested for binding activity. The 
process can be iteratively repeated until a desired polypeptide is engineered. The invention is 
not limited to only a single round of screening. This iterative practice of determining which 
oligonucleotides are most related to the desired activity allows more efficient exploration all 
of the possible protein variants that might be provide a particular property or activity. 

15 Mutagenized Oligonucleotides 

While the optimized directed evolution method can use oligonucleotides that 
have a 100% fidelity to their parent polynucleotide sequence, this level of fidelity is not 
required. For example, if a set of three related parental polynucleotides are chosen to 
undergo ligation reassembly in order to create, e.g., an antibody with an altered binding 

20 affinity or specificity, a set of oligonucleotides having unique overlapping regions can be 

synthesized by conventional methods. However a set of mutagenized oligonucleotides could 
also be synthesized. These mutagenized oligonucleotides are preferably designed to encode 
silent, conservative, or non-conservative amino acids. 

The choice to enter a silent mutation might be made to, for example, add a 

25 region of nucleotide homology two fragments, but not affect the final translated protein. A 
non-conservative or conservative substitution is made to determine how such a change alters 
the function of the resultant polypeptide. This can be done if, for example, it is determined 
that mutations in one particular oligonucleotide fragment were responsible for increasing the 
activity of a peptide. By synthesizing mutagenized oligonucleotides (e.g.: those having a 

30 different nucleotide sequence than their parent), one can explore, in a controlled manner, how 
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resulting modifications to the peptide or protein sequence affect the activity of the peptide or 
polypeptide. 

Another method for creating variants of a nucleic acid sequence using 
mutagenized fragments includes first aligning a plurality of nucleic acid sequences to 
5 determine demarcation sites within the variants that are conserved in a majority of said 
variants, but not conserved in all of said variants. A set of first sequence fragments of the 
conserved nucleic acid sequences are then generated, wherein the fragments bind to one 
another at the demarcation sites. A second set of fragments of the not conserved nucleic acid 
sequences are then generated by, for example, a nucleic acid synthesizer. However, the not 

10 conserved, sequences are generated to have mutations at their demarcation site so that the 
second fragments have the same nucleotide sequence at the demarcation sites as said first 
fragments. This allows the not conserved sequences to still hybridize during the ligation 
reaction to the other parental sequences. Once the fragments are generated, a desired number 
of crossover events can be selected for each of the variants. The quantity of each of the first 

15 and second fragments is then calculated so that a ligation/incubation reaction between the 
calculated quantities of the first and second fragments will result in progeny molecules 
having the desired number of crossover events. 

Screening Methodologies and Devices 

In practicing the methods of the invention and determining the properties of 
20 the chimeric antigen binding polypeptides of the invention any method or device can be used. 

Capillary Arrays 

Capillary arrays, such as the GIGAMATRIX™, Diversa Corporation, San 
Diego, CA, can be used to screen for or monitor a variety of compositions, including the 
polypeptides and nucleic acids of the invention. Capillary arrays provide an efficient system 

25 for holding and screening samples. For example, a sample screening apparatus can include a 
plurality of capillaries formed into an array of adjacent capillaries, wherein each capillary 
comprises at least one wall defining a lumen for retaining a sample. The apparatus can 
further include interstitial material disposed between adjacent capillaries in the array, and one 
or more reference indicia formed within of the interstitial material. A capillary for screening 

30 a sample, wherein the capillary is adapted for being bound in an array of capillaries, can 
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include a first -wall defining a lumen for retaining the sample, and a second wall formed of a 
filtering material, for filtering excitation energy provided to the lumen to excite the sample. 

A polypeptide or nucleic acid, e.g., an antibody, can be introduced into a first 
component into at least a portion of a capillary of a capillary array. Each capillary of the 
capillary array can comprise at least one wall defining a lumen for retaining the first 
component, and introducing an air bubble into the capillary behind the first component. A 
second component can be introduced into the capillary, wherein the second component is 
separated from the first component by the air bubble. A sample of interest can be introduced 
as a first liquid labeled with a detectable particle into a capillary of a capillary array, wherein 
each capillary of the capillary array comprises at least one wall defining a lumen for retaining 
the first liquid and the detectable particle, and wherein the at least one wall is coated with a 
binding material for binding the detectable particle to the at least one wall. The method can 
further include removing the first liquid from the capillary tube, wherein the bound 
detectable particle is maintained within the capillary, and introducing a second liquid into the 
capillary tube. 

The capillary array can include a plurality of individual capillaries comprising at 
least one outer wall defining a lumen. The outer wall of the capillary can be one or more walls 
fused together. Similarly, the wall can define a lumen that is cylindrical, square, hexagonal or 
any other geometric shape so long as the walls form a lumen for retention of a liquid or sample. 
The capillaries of the capillary array can be held together in close proximity to form a planar 
structure. The capillaries can be bound together, by being fused (e.g., where the capillaries are 
made of glass), glued, bonded, or clamped side-by-side. The capillary anay can be formed of 
any number of individual capillaries, for example, a range from 1 00 to 4,000,000 capillaries. A 
capillary array can form a microtiter plate having about 100,000 or more individual capillaries 
bound together. 

Arrays, or "BioChips" 

In one aspect of the invention, the chimeric polypeptides or nucleic acids of 
the invention can be analyzed by their immobilization onto an array, or "biochip." 
Alternatively, antigen binding polypeptides can be screened by immobilizing antigens to an 
array. In practicing the methods of the invention, known arrays and methods of making and 
using arrays can be incorporated in whole or in part, or variations thereof, as described, for 
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example, in US. Patent Nos. 6,277,628; 6,277,489; 6,261,776; 6,258,606; 6,054,270; 
6,048,695; 6,045,996; 6,022,963; 6,013,440; 5,965,452; 5,959,098; 5,856,174; 5,830,645; 
5,770,456; 5,632,957; 5,556,752; 5,143,854; 5,807,522; 5,800,992; 5,744,305; 5,700,637; 
5,556,752; 5,434,049; see also, e.g., WO 99/51773; WO 99/09217; WO 97/46313; WO 
5 96/17958; see also, e.g., Johnston (1998) Curr. Biol. 8:R171-R174; Schummer (1997) 

Biotechniques 23:1087-1092; Kern (1997) Biotechniques 23:120-124; Solinas-Toldo (1997) 
Genes, Chromosomes & Cancer 20:399-407; Bowtell (1999) Nature Genetics Supp. 21:25- 
32. See also published U.S. patent applications Nos. 20010018642; 20010019827; 
20010016322; 20010014449; 20010014448; 20010012537; 20010008765. 

10 Antibodies and Immunoblots 

In one aspect of the invention, animals are immunized before isolation of 
nucleic acids encoding antigen binding sequences. Methods of immunization, producing and 
isolating antibodies (polyclonal and monoclonal) are known to those of skill in the art and 
described in the scientific and patent literature, see, e.g., Coligan, CURRENT PROTOCOLS 

15 IN IMMUNOLOGY, Wiley/Greene, NY (1991); Stites (eds.) BASIC AND CLINICAL 

IMMUNOLOGY (7th ed.) Lange Medical Publications, Los Altos, CA ("Stites"); Goding, 
MONOCLONAL ANTIBODIES: PRINCIPLES AND PRACTICE (2d ed.) Academic Press, 
New York, NY (1986); Kohler (1975) Nature 256:495; Harlow (1988) ANTIBODIES, A 
LABORATORY MANUAL, Cold Spring Harbor Publications, New York. Antibodies also 

20 can be generated in vitro, e.g., using recombinant antibody binding site expressing phage 
display libraries, in addition to the traditional in vivo methods using animals. See, e.g., 
Hoogenboom (1997) Trends Biotechnol. 15:62-70; Katz (1997) Annu. Rev. Biophys. 
Biomol. Struct. 26:27-45. 

Sources of Cells and Culturing of Cells 

25 Any vertebrate cell can be used as a source of nucleic acid encoding an 

antigen binding polypeptide. As noted above, immunoglobulin coding sequences can be 
isolated from cells of the immune system, e.g., B cells or plasma cells. Once a chimeric or 
modified antigen binding polypeptide coding sequence has been generated, it can be 
expressed in any cell, e.g., bacterial, Archaebacteria, mammalian, yeast, fungi, insect or 

30 plant cells. In one aspect, the cell can be from a tissue or fluid taken from an individual, e.g., 
a patient. The cell can be from, e.g., lymphatic or lymph node samples, serum, blood, chord 
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blood, CSF or bone marrow aspirations, fecal samples, saliva, tears, tissue and surgical 
biopsies, needle or punch biopsies, and the like. 

Any apparatus to grow or maintain cells can be used, e.g., a bioreactor or a 
fermentor, see, e.g., U.S. Patent Nos. 6,242,248; 6,228,607; 6,218,182; 6,174,720; 6,168,949; 
5 6,133,022; 6,133,021; 6,048,721; 5,660,977; 5,075,234. 

Genetic Vaccines 

The invention provides genetic vaccines comprising chimeric nucleic acids 
selected from the libraries of the invention. These genetic vaccines can be used in nucleic 
acid- or immunoglobulin- mediated immunomodulation. The invention provides various 

10 approaches for the evolution of genetic vaccines by stochastic (e.g. polynucleotide shuffling 
& interrupted synthesis) and non-stochastic polynucleotide reassembly. 

A genetic vaccine is an exogenous polynucleotide that produces a medically 
useful phenotypic effect upon the mammalian cell(s) and organisms into which it is 
transferred. A genetic vaccine may be in the form of "naked" nucleic acid or as a vector. The 

15 vector or nucleic acid may or may not have an origin of replication. For example, it may be 
useful to include an origin of replication in a vector to allow for propagation of the vector in 
order to obtain sufficient quantities of the vector prior to administration to a patient. If the 
vector is designed to integrate into host chromosomal DNA or bind to host mRNA or DNA, 
or if replication in the host is otherwise undesirable, the origin of replication can be removed 

20 before administration, or an origin can be used that functions in the cells used for vector 

production but not in the target cells. However, in certain situations, including some of those 
discussed herein, it is desirable that the genetic vaccine vector be capable of replicating in 
appropriate host cells. 

Vectors used in genetic vaccination can be viral or nonviral. Viral vectors are 

25 usually introduced into a patient as components of a virus. Exemplary vectors include, for 
example, adenovirus-based vectors (Cantwell (1996) Blood 88:4676-4683; Ohashi (1997) 
Proc. Natl Acad. Sci USA 94:1287-1292), Epstein-Barr virus-based vectors (Mazda (1997) 
J. Immunol. Methods 204:143-15 1), adenovirus- associated virus vectors, Sindbis virus 
vectors (Strong (1997) Gene Then 4: 624-627), herpes simplex virus vectors (Kennedy 

30 (1997) Brain 120: 1245-1259) and retroviral vectors (Schubert (1997) Curr. Eye Res. 16:656- 
662). Nonviral vectors, typically dsDNA, can be transferred as naked DNA or associated 
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with a transfer-enhancing vehicle, such as a receptor- recognition protein, liposome, 
lipoamine, or cationic lipid. This DNA can be transferred into a cell using a variety of 
techniques well known in the art. For example, naked DNA can be delivered by the use of 
liposomes which fuse with the cellular membrane or are endocytosed, i.e., by employing 
ligands attached to the liposome, or attached directly to the DNA, that bind to surface 
membrane protein receptors of the cell resulting in endocytosis. Alternatively, the cells may 
be permeabilized to enhance transport of the DNA into the cell, without injuring the host 
cells. One can use a DNA binding protein, e.g., HBGF-1, known to transport DNA into a 
cell. Furthermore, DNA can be delivered by bombardment of the skin by gold or other 
particles coated with DNA that are delivered by mechanical means, e.g., pressure. These 
procedures for delivering naked DNA to cells are useful in vivo. For example, by using 
liposomes, particularly where the liposome surface carries ligands specific for target cells, or 
are otherwise preferentially directed to a specific organ, one may provide for the introduction 
of the DNA into the target cells/organs in vivo. 

EXAMPLES 

The following examples are offered to illustrate, but not to limit the claimed 

invention. 

Example 1 : Building genes using an exemplary library and method of the invention 

The following example describes building a nucleic acid, a gene, using an 
exemplary oligonucleotide library and method of the invention. 

Building polynucleotides using the methods of the invention does not require 
handling of any template or parental DNA. Codon usage can be optimized towards any 
expression host Restriction sites can be added/changed according to cloning needs. 

This exemplary system of the invention uses a library of oligonucleotide 
building blocks to generate a DNA sequence. Oligonucleotide building blocks are designed 
for each sequence to be custom built. In one aspect, the library consists of all possible di- 
codon combinations at total of 4096 clones and 61 linker fragments. Oligonucleotide 
building blocks can be designed for each custom built sequence. Each oligonucleotide 
building block is cloned, sequence verified, PCR amplified (or prepped from a restriction 
digest) and pre-cut. See Figure 1 for a summary of this exemplary iterative codon by codon 
gene building protocol. 
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Building Block Library construction 

A library of 4096 unique "building block" oligonucleotides is constructed in 
which each oligonucleotide (and corresponding clone into which the oligo is inserted) 
contains one specific di-codon sequence. The "building block" oligonucleotides are PCR 
amplified. "Starter" fragments to be linked to a solid support are precut at a 3' codon. 
"Elongation fragments" are precut in a 5' codon. The "starter" fragments (to be bound to 
solid support) and "elongation fragments" are cut with different Type-HS restriction 
endonucleases; e.g., the starter" fragments are cut with Earl and the "elongation fragments" 
are precut with Sapl, or, vice versa. In one example, "starter" fragments are first cut with 
Bbsl for ligation to a "hook" and then cut with Earl after coupling to hook. "Elongation 
fragments" are amplified with primers SapF and T3 (a Sapl site introduced during PCR) and 
cut with Sapl. In one exemplary protocol, PCR amplification of the building block 
oligonucleotides adds a Sapl site and deletes the Earl site. Each "building block" 
oligonucleotides is cloned and each dicodon sequence verified. 

In this exemplary method, the cloning vector into which each oligonucleotide 
building block is inserted is a modification of pBluescriptll Ks minus™ (Stratagene, San 
Diego, CA). The following changes were made: 

Removal of vector-specific Sapl and Earl sites: 

As in some aspects Sapl and Earl are used to generate overhangs in the 
building block oligonucleotides, it is necessary to remove Sapl and Earl recognition sites in 
the vectors. In this example, pBluescriptll Ks minus™ contains three Earl sites (at positions 
518, 1038 and 2842), one of them overlapping a single Sapl site (at position 1038). These 
sites can be removed by, e.g., using Stratagene's QUICKCHANGE SITE DIRECTED 
MUTAGENESIS™ kit. Successful changes can be verified by restriction cuts using Sapl 
and Earl and/or sequencing. In this example, the modified vector was designated pASE. 

Insertion of a single Bbsl site: 

The "starter fragments" need to be ligated to the "hook" immobilized on the 
solid support, in this example, the hook is immobilized to magnetic beads. A non- 
palindromic overhang (e.g., 5'-GGGG-3') can be used in order to avoid self-ligation of the 
fragments. The sequence is available by insertion of this double stranded fragment into the 
pASE vector (see above) and with Sacl/Notl. In to the linearized vector insert: 
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Sad i NotI 

5' - AGCTCGAAGACTTG GGGTTGT CTTC ACCGCGGTGGC (SEQ ID NO: 15) 

3' -GCTTCTGAACCCCAGAATGGCGCCACCGCCGG - 5' (SEQ ID NO: 16) 
Bbsl t 

This introduces Bbsl site to create GGGG overhangs for high ligation 
efficiency (connection to hook fragment on solid support). Annealing of equal molar amount 
of PAGE purified oligonucleotides (e.g., from Integrated DNA Technologies, Coralville, IA) 
will create the double stranded (ds) fragment as shown above. Successful integration can be 
verified by restriction cut with Bbsl and sequencing. The Bbsl site is designed to generate a 
5'-GGGG overhang. This modified vector is designated pBbs4G. This vector (pBbs4G) can 
be used for making the library. 

Insertion of Sma/PstI Spacer 

In this example, inserts of the oligonucleotide library have blunt ends on one 
side and PstI compatible 3' -overhangs on the other enabling directed cloning without further 
manipulation into a Smal/PstI cut vector. These sites are located directly next to each other 
in the pBluescriptn Ks minus™ (Stratagene, San Diego, CA) vector. After the first enzyme 
cuts, the recognition sequence of the other one is very close to the end of the DNA. PstI and 
Smal do not cut efficiently close to DNA ends. This problem can be solved by inserting this 
dsDNA into the vector pBbs4G cut with Smal and Hindm, dephosphorylated and gel 
purified: 

Cut pBbs4G with Smal/HindEI, insert: 

Smal (half) PstI EcoRI Hindlll 

5' - GGGCATCATCATCATCATCTGCAGGAATTCGATATGA (SEQ ID NO: 17) 
3'- CCCGTAGTAGTAGTAGTAGACGTCCTTAAGCTATACTTCGA (SEQ ID NO: 18) 

Separate Smal and PstI to make double cuts more efficient. The fragment can 
be generated by annealing complementary, 5'-phosphorylated oligonucleotides, as noted 
above. Successful integration can be checked by sequencing. The modified vector is 
designated pGBl. Kpnl or Sad can be used instead of PstI without vector modification, but 
this may result in much shorter fragments (see below) which are more difficult to prepare 
(the efficiency of standard methods drops below about 70 base pairs). 
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Design of the Building Blocks 

In this exemplary procedure, to start gene synthesis with any codon 
simultaneously at several starting points a total of 61 "starter" and 4096 "elongation" 
fragments are used. All fragments can be cloned into pGBl (see above). The vector can be 
cut with Smal and PstI, dephosphorylated and gel purified. 

"Starter fragments " 

The 61 "starter" clones can be created by annealing two partially 

complementary oligonucleotides, as illustrated below. Filling in the 5' overhangs with 

Klenow DNA polymerase and cloning the mixture into pGBl as described above. Sapl can 

be used to generate the overhang for ligation of the first elongation fragment. BsmFI can be 

used to release partial genes from the solid support and ligate those to generate full length 

genes. The vector is cut with SmaJTPstl. 

5 ' -GGGACGCACTTCANNNTGAAGAGCGCTGCTACTAACTGCA- 3 ' 
. ACTTCTCGCGACGATGATTG - 5 ' 



Anneal, fill in with 
\ Klenow DNA polymerase 

BsmFI y 

5 ' ~ GGGACGCACTTCANNNTGAAGAGCGCTGCTACTAA.CTGCA - 3 ' 
3 ' - CCCTGCGTGAAGTNNNACTTCTCGCGACGATGATTG - 5 ' 

A Sapl 

Bbvl 

BsmFI Earl Bbvl PstI 



5' -GGGACGTTCT TCGNNNNNNT GAAGAGAGCT GCTACTAACT GCA (SEQ ID NO: 19) 
3' -CCCTGCAAGA AGCNNNNNNA CTTCTCTCG A CGATGATTG - 5' (SEQ ID NO: 20) 

Sapl 



The oligonucleotide can be made by "filling in" : 

GGGACGTTCT TCGNNNNNN TGAAGAGAGCT GCTACTAACT GCA (SEQ ID NO: 19) 

A CTTCTCTCGA CGATGATTG (subseq of SEQ ID NO: 20) 
«-= fill in 
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In one aspect, 96 colonies are picked and sequenced. Missing codons can be 
created using a sequence-specific primer instead of a degenerate primer. The cloning 
procedure is the same as outlined above. 

"Elongation Fragments " 

The "Elongation Fragments" containing all possible 4096 dicodon 
combinations (all possible two-codon combinations) can be generated according to the 
procedure as described above. The oligos used are as follows: 

5 ' - GGGACGCTCTTCANNNNNNTGAAGAGTGCTGCTACTAACTGCA - 3 ' 

ACTTCTCACGACGATGATTG - 5 ' 

Anneal, fill in with 
Klenow DNA polymerase 

BsmFI Sapl Earl 

5' - GGGACGCTCTTc kN p^INNN TGAAGAGTGCTGCTACTAACTGCA - 3 9 
3 ' - CCCTGCTAGAAGTNkp^N ACTTCTCACGACGA TGATTG- 5 ' 

Sapl Earl Bbvl 

The clones have this design: 

Sac I Bbsl Not I Spel 
T7 promoter 

CGCGCX^TAATACGACTCACTATAGCKj 
GCGCGCATTATGCTGAGTGATATCCCGCTTAACC^ 

Primer E_F 

BamHI BsmFI Earl Bbvl PstI EcoRI Hindlll Clal 



GGATC CCCCTGGQACQTTCTTCQN NNNNNTQAAQAGAGCTGCTAC 
CCTAGGGGGACCCTGCAAGAAGCNNNNttJNACTT 

Sail Xhol Kpnl 

T3 promoter 

CGTCGACCTCGAGGGGGGGCCCGGTACCCAGCTTTTGTTCCCT ( a ) 

GCAGCTGGAGCTCCCCCCCGGGCCATGGGTCX3AA (b ) 

strand (a) is (SEQ ID NO:21) 
strand (b) is (SEQIDNO:22) 
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Sapl is used to generate 5'overhangs prior to the ligation. Earl is used to 
create 5'overhangs in the next codon for addition of the next fragments. BsmFI and Bbvl 
restriction sites are positioned to enable cutting within the first two and last two codons of a 
synthesized DNA fragment. BsmFI is used to release partial genes from the solid support. 
5 Bbvl is used to generate compatible overhangs at the 3' end of partial genes attached to the 
solid support. 

The library comprises 4096 clones. Two of the clones (coding for the 
sequence CTCTTC and GAAGAG) cannot be used for the assembly process because they 
encode the Earl recognition sequence. This is not a problem because the target sequences 

10 can be modified accordingly. In order to capture and conserve the entire variability, 10,000 
single colonies are picked into 96-well plates. An automated colony picker can be used for 
this purpose. In one aspect, it is sufficient to have 96 unique clones. In one aspect, enough 
clones are sequenced to be able to synthesize an artificial gene of one kbp in length. 

In one aspect, only four different class IIS restriction enzymes (Sapl, Earl, 

15 BsmFI, Bbvl) are used to generate compatible overhangs for the ligation of the individual 
building blocks. Sapl and Earl generate 3-base 5' overhangs, BsmFI and Bbvl 4-base 
5'overhangs. The design of the starter/elongation clones is shown in Table 2: 
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Starter clones 

T7 primer Sad Bbsl NotI Xbal 

TAATACGACTCACTATAGGGC GAATTGGAGCTCGAAGACTTGGGGTrTTArrrtrflnTortnrtnrrnrTnT^ 
ATTATGCTGAGTGATATCCCGCTTAACCTCGAGCTTCTGAACCCCAGAATGGCGCCACCGCCGGCGAGAT 

BsmFI Sap I Bbvl PstI EcoRI 

GAACTAGTGGATCCCCCGGGACGCACTTCANNNTGAAGAGCGCTGCTACTAACTGCAGGAATTCGATATG 
r CTTGATCACCTAGGGGGCCCTGCGTGAAGTNNNACTTCTCGCGACGATGATTGACGTCCTTAAGCTATAC 

Clal Sail Xhol Kpnl 

AAGCTTATCGATACCGTCGACCTCGAGGGGGGGCCCGGTACCCAGCTTTTGTTCCCTTTAGTGAGGGTTA 
TTCGAATAGCTATGGCAGCTGGAGCTCCCCCCCGGGCCATGGGTCGAAAACA AGGGAAATCACTCCCAAT 

T3 primer 

Elongation clones 

T7 primer SacI Bbsl NotI Xbal 

TAATACGACTCACTATAGGGCG AATTGGAGCTCGAAGACTTGGGGTCTTACCG CGGTGGCGGCCG CT C?T A 
ATTATGCTGAGTGATATCCCGCTTAACCTCGAGCTTCTGAACCCCAGAATGGCGCCACCGCCGGCGAGAT 



Table 2: Desiga of the building blocks. 
Starter fragments. The inserts can be recovered as restriction fragments (Bbsl/Kpnl; 140 bp) 
or by amplification with T7/T3 primers (210 bp) and a restriction cut with Bbsl (170 bp). 
Elongation fragments. The inserts can be recovered as restriction fragments (Sapl/Kpnl; 88 
bp) or by amplification with S1/T3 primers (127 bp) and a restriction cut with Sapl (110 bp). 

Preparation of building blocks: 

Starter and elongation fragments can be generated by PCR, purified by using, 
e.g., the Qiagen PCR purification kit, digested by Sapl, and purified again by using a Qiagen 
PCR purification kit. These processes can be carried out in a 96-well format on, e.g., a 
Beckman BIOMEK 2000™. The standard operation protocols are used. The purified 
building blocks can be stored at a standardized DNA concentration (e. g. 100 pmol/|al) in 96- 
well deep blocks (up to 2 ml). 

It is not anticipated that PCR-introduced nucleotide substitution will cause a 
significant number of mutations in the synthesized gene. A THERMAL ACE™ DNA 
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polymerase (Jhvitrogen) can be used; it is a high fidelity/high efficiency enzyme. The error 
rate is 1/(6 x 10 5 ). This means one out of 1500 copies of a 200 bp PCR product 
(600,000b:400 b) has one error on average. Only 6 bp (12 bases) of each fragment are used 
for the synthesis. The probability that one of these bases is wrong is only 3% for a 200 bp 
product (12:400). Therefore only one out of 50,000 copies has an error introduced in the di- 
codon region (= 0.002%; compared to synthetic oligos: 2 - 5%). Mutations outside of the di- 
codon region do not carry through to the synthesized sequence. 

Mutated codons are further discriininated during ligation. Several hundred 
clones from synthetic genes and gene reassembly projects have been sequenced and no 
introduced base error or missing/wrong bases have been seen in the overhang region. 

Plasmid preparation is an alternative to PCR amplification. Building blocks 
can be prepared from restriction digestion of the plasmid DNA. The fragments can be 
purified from its vector backbone by a size-fractionation column. This method is an 
alternative if nucleotide substitution causes a high mutation rate. 

The Elongation Protocol 

In one aspect, the elongation cycle involves 3 steps: (1) covalent linkage of 
the new fragment by DNA ligase, (2) fill-in the unligated overhangs by Klenow DNA 
polymerase, and (3) restriction digestion by Earl to generate the next overhang. Each step 
can be optimized separately, and then synthesize several short DNA sequences (30-60 bp) to 
test and optimize the entire synthesis cycle. The synthesized fragments can be cloned and 
sequenced to verify the efficiency and the fidelity of the elongation reactions. 

In one aspect, reassembly of DNA molecules from synthetic oligonucleotides 
using the solid-phase support is applied to the reassembly of gene families. In this protocol, 
full-length reassembled genes were obtained by step-wise ligation of annealed 
oligonucleotides of 30-50 bases. 

Two different sets of building blocks need to be prepared from the library's 
"archived" clones: 

starter fragments 

o can be linked to solid support 

o amplification with primers E_F and T3 

o cut with Bbsl for ligation to hook 
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o cut with Earl after coupling 

- elongation fragments 

o amplification with primers SapF and T3 
o Sapl site introduced during PCR 
o Cut with Sapl 

o Used to elongate starter fragments by one codon/elongation cycle 

Hook for linking starter fragments to solid support: Immobilization of the hook fragment 
Paramagnetic beads coated with Streptavidin can be purchased from Dynal A.S. 
(Oslo, Norway). The 5'- biotinylated forward oligo (5'-bio- 

GAACGATAATAAGCTTGATGACGAAGACAT-3 ') (SEQ ID NO:23) and the reverse 
oligo (5'-CCCCATGTCTTCGTCATCAAGCTTATTATCGTTC-3') (SEQ ID NO:24) can 
be purchased, e.g., from Integrated DNA Technologies Inc. (Coralville, IA). The two 
oligonucleotides can be annealed to generate the hook fragments. The hook fragments can be 
immobilized to the beads according to manufacturer's instructions (e.g., the Dynal protocol). 
T7 promoter 

(N N N ) x CGCGCGTAATAC G ACTCACTATAGGGCGAATTG GAGCTC (SEQ ID NO:25) 

(NNN)xGCGCGCATTATGCTGAGTGATATCCCGCTTAACCTCGAGCCCC (SEQ ID NO:26) 

Preparation of "Hook": 

- length/sequence variable 

- may contain promoter (e.g. T7) for in vitro transcription/translation 

- compatible overhang for ligation of starter fragments 
Alternative method: 

Instead of using PCR fragments derived from sequence verified clones, 
building blocks are synthesized from short (about 20 to 25 base pairs (bp)) double stranded 
(ds)DNA fragments derived from oligos. Only the 3 bases at the 3 'end of the bottom strand 
(see figure) are critical for building a correct sequence. 

Principle: 

>solid support< — hook - starter fragment — codon specific overhang 
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.- Hook for linking starter fragments to solid support: 
T7 promoter 

(N N N) x CGCGCGTAATAC G ACTCACTATAGGG CG AATTGGAG CTC (SEQ ID NO:27) 

(NNN)xGCGCGCATTATGCTGAGTGATATCCCGCTTAACCTCGAGCCCC (SEQ ID NO:28) 

Starter fragment: 

BsmFI 

GGGGATCC TGGGAC GTTCTTCG (SEQ ID NO: 29) 

TAGGACCCTGCAAGAAGCNNN (SEQ ID NO: 30) 

Building blocks: 

NNNnnnTGAAGAGAGCTGCTACTAA (SEQ ID NO:3 1) 

nniiA^CTTCTCTCGACGATGATTGACGTCCTTAA (SEQ ID NO:32) 

In summary, as illustrated in Figure 1, the "elongation cycle" of this 
exemplary gene building method of the invention comprises: "loading" starter oligo onto 
substrate; ligation (with any ligase, e.g., T4 ligase or E. coli ligase); wash; fill-in ends; wash; 
cut with restriction endonuclease; wash; repeat (reiterate cycle). Any type of protocol or 
alternative protocols can be used. Optimization of conditions can be done by routine 
screening of a range of parameters, e.g., temperature, time, buffers, number of elongation 
cycles, which ligase to use, choice of solid substrate, if any, and the like. 
Ligation 

Enzymes 

In one aspect, the T4 DNA ligase is used; it is the most commonly used 
enzyme in DNA ligation reactions. It has a high specific activity and joins 5' or 3* 
protruding compatible overhangs very efficiently. It also ligates blunt-ended fragments but 
at a lower efficiency. This creates a possible problem, because the building blocks (if 
generated by PCR) are blunt-ended on one side and could ligate to other blunt-ended 
fragments resulting from the fill-in reaction. Dimerization of building blocks will not be a 
problem because non-phosphorylated primers are used for PCR. In one aspect, to avoid these 
side reactions E. coli DNA ligase can be used as an alternative to T4 DNA ligase. E. coli 
DNA ligase is NAD + -dependent and ligates only cohesive ends of DNA fragments. It has a 1 
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to 2 order of magnitude higher fidelity but lower specific activity than T4 DNA ligase. The 
E. coli DNA ligase is commercially available. Using routine screening protocols, both 
enzymes can be evaluated to determine the most efficient procedure under desired 
conditions. 

5 Optimization 

Using routine screening protocols, the ligation efficiency under different 
conditions can be optimized for, e.g., desired results, materials and/or conditions. Three 
parameters can be optimized, DNA concentration, enzyme units, and reaction time. A 
fluorescence (e.g. 6-Fam) labeled T3 primer (see Table 2 above) can be used with an 

10 unlabeled SI primer in PCR reactions, using known di-codon clones as templates, to generate 
labeled elongation fragments. Several labeled fragments can be generated to cover different 
GC content in the overhangs. These fragments can be used to monitor the ligation efficiency 
during protocol development. In each reaction, one of the labeled fragments can be used as 
the last one to be added to the elongation chain (2 to 3 codons for the purpose of protocol 

15 development). Upon completion of the reaction, the fragments can be released from the 

solid-support and incorporated label can be analyzed, e.g., on an ABI PRISM 310 GENETIC 
ANALYZER™. A method as described by, e.g., Liu (1997) Appl. Environ. Microbiol 
63:4516-4522, can be used. 

Fill-in reaction 

20 Enzymes 

In the ligation step, a molar excess of the next building block can be used to 
saturate the fragments attached to the beads and to drive the ligation to completion. The 
methods of the invention can be a multi-step process; therefore, even trace amounts of un- 
ligated fragments could reduce the accuracy and quality of the final product. To prevent un- 

25 ligated fragments from elongation in later cycles (same codon), a Klenow DNA polymerase 
can be used after each ligation step to fill in un-ligated overhangs. Klenow DNA polymerase 
has the advantage of being active in almost all commonly used restriction buffers avoiding 
additional buffer exchange. In one aspect, the enzyme is inactivated, e.g., heat-inactivated, 
before the next ligation step. 
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Optimization fill-in conditions 

Using routine screening protocols, fill-in reaction conditions can be optimized 
for, e.g., desired results, materials and/or conditions, In one aspect, to optimize reaction 
conditions (fill in of all ends), a DNA fragment (30-40 bp) is used with a 3-base 5' overhang 
5 as a substrate for the reaction. Two complementary oligos can be designed. The forward 
oligo can contain a 5' fluorescence (e.g. 6-Fam) label. The reverse primer can be 3-bases 
longer at the 5' than the forward oligo. Annealing of these two oligos will generate a 
fluorescence labeled DNA fragment with a 3-base 5' overhang. The annealed fragment can 
be used as the substrate for the optimization of the fill-in reaction. Upon the completion of 
10 the reaction, the sample will be analyzed on, e.g., an ABI PRISM 310 GENETIC 
ANALYZER™ as described above. 

The percentage of the unfilled fragment (same length as the forward oligo), 
partially filled fragments (one or two bases longer than the forward oligo), and completely 
filled fragment (same length as the reverse oligo) can be determined to assess the efficiency 
15 of the fill-in reaction. The fill-in reaction has to be optimized regarding (1) enzyme 
concentration, (2) buffer composition, (3) incubation time, and (4) inactivation 
temperature/time. 

• Restriction digest optimization 

In one aspect, Earl is used after the fill-in reaction to generate a new 
20 overhang. Optimization of this step can include enzyme concentration and incubation time. 
A strategy similar to the one used for the optimization of the ligation reaction will be used for 
this reaction. A labeled building block can be linked to the hook fragment by ligation and cut 
with Earl. Release of labeled fragment can be analyzed on, e.g., an ABI PRISM 310 
GENETIC ANALYZER™ as described above, 

25 Software development and automation 

Manipulation of a target sequence 

To manipulate a sequence that is synthesized by the methods of the invention, 
silent mutations can be performed for host optimization and/or for the elimination of 
restriction sites for Earl, Sapl, BsmFI and/or Bbvl I in the sequence (e.g., newly synthesized 
30 gene). In one aspect, sequence manipulation is determined by software analyses in 
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preparation for synthesis by the methods of the invention. In one aspect, silent mutations for 
both codon optimization and restriction site manipulation are performed. 

Automation for building block preparation 

In one aspect, preparation of building blocks is performed on a Beckman 
BIOMEK 2000™ using off-the-shelf software and preparation kits. These operations are 
currently standard procedures; no further development are required to perform this step of the 
protocol. 

Software to generate a sequence from available building blocks 

If not all building blocks are available, it may be necessary for a sequence to 
be built from the available material. A software application can be written that takes the 
sequencing results of the available building blocks into account and creates a feasible 
sequence. The software can loop through all wells in the experiment and create a database of 
all other wells that have the complimenting sequence. To create the sequence the software 
can pick a building block to start with and chooses randomly from all of the building blocks 
that can be added to that one. The system can repeat this process for as many building blocks 
as are required for the desired length. 

Automation to execute the elongation protocol 

To execute the elongation protocol, an automation system can be developed 
that will read a file containing the gene sequence into memory and command a Beckman 
BIOMEK 2000™ robot to perform the steps in the protocol. To choose building blocks, the 
software can read the first and second codon in the sequence being synthesized. That 
sequence uniquely identifies a building block that can then be pipetted from the appropriate 
building block material plate. After loading the building block material, the robot can 
automatically perform the remainder of the elongation cycle. The next building block can be 
determined from the second and third codons in the sequence. This process can be repeated 
until the gene is complete. 

Synthesis of an Artificial Gene 

In one aspect a gene for an artificial protein sequence with a length of about 
300 residues is generated based on the available di-codon clones. The gene can be 
synthesized according to the optimized elongation protocol, as discussed above. To 
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" maximize efficiency, small, equally sized fragments can be synthesized in parallel (round I). 
These partial genes can be used as building blocks in round II to generate the full-length 
gene. The number of codons per fragment in round I can be determined by the maximum 
number of cycles, which can be carried out from one starting point (see below). 

Up to 22 fragments have been joined in using the exemplary protocol of the 
invention. For a gene of 300 codons, 14 fragments can be synthesized in parallel in the 
round I of synthesis. In the second round of the synthesis, 13 fragments can be ligated to the 
first fragment sequentially. The length of the incoming fragment may have little or no effect 
on the ligation efficiency. Thus, the efficiency of the second round synthesis of the 14 
fragments can be similar to the first round synthesis. 

The same artificial gene can be synthesized using oligos and a standard solid- 
phase protocol. Oligos can be ordered from a commercial source, e.g., Integrated DNA 
Technologies, and ligated to synthesize the full-length gene. This product can be used as a 
control to evaluate the efficiency and accuracy of additional products of the methods of the 
invention, as compared to a traditional method. At least 20 clones from each experiment can 
be sequenced and compared. 

Example 2: Antibody reassembly 

The following example describes implementation of the antibody reassembly 
methods of the invention to generate chimeric antigen binding polypeptides. 

Reassembly strategy: 

A cloning vector was designed as schematically illustrated in Figure 1. Any 
ribosome binding site (RBS) sequence or green fluorescent protein coding sequence (GFP) 
can be used, may of which are well known in the art. 

Reassembly strategy for lambda light chains: 

To reassemble lambda light chains, three domains were provided: 

• Vl : 38 sequences in 10 families; about 300 base pairs (bp) in length (~ 300 bp) 

• Jl : 4 sequences; about 35 base pairs (bp) in length (~ 35 bp) 

• Cl : 1 sequences; about 320 base pairs (bp) in length (~ 320 bp) 

->38x4xl = 154 different combinations 
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V L sequences were PCR amplified with gene specific primers: 

=> 5' oligos are designed with a Xhol site; 3' primers are designed with 
extension/SapI site (see scheme in Figure 2); 

=> J L sequences are generated from oligos (see Fig. 2 and SEQ ID NO: 1, 
SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4); 

=> Cl sequence is PCR amplified with an oligo including a BsrDI site at the 
5' end and a Xbal site at the 3 9 end. 

Because only 1 Vl gene has an internal Sapl site: 
-> 37 x 4 x 1 = 148 combinations 

Figure 2 schematically illustrates an exemplary scheme to reassemble lambda 
light chains according the methods of the invention. J region oligos (in the center shaded 
box) are SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4. 

Primers for PCR amplification of Vx and Cx are: 
Reverse primer Vx add-on: 

CATCA TGCTCTTCA CACMNM (SEQ ID NO:5) plus gene specific 
sequence (M = C or A) 

Forward primer Cx 5* add-on: 

CTACTAGGTCTCATCCTG (SEQ ID NO:6) plus gene specific sequence; 
(last codon in J region changed from CTA to CTG because of codon usage in E. coli). 

Reassembly strategy for kappa light chains: 

To reassemble lambda light chains, three domains were provided: 

• Vk : 49 sequences in 7 families; about 300 base pairs (bp) in length (~ 300 bp) 

• J K : 5 sequences; about 35 base pairs (bp) in length (~ 35 bp) 

• Ck : 1 sequences; about 320 base pairs (bp) in length (~ 320 bp) 

-> 49 x 5 x 1 = 254 combinations 

Figure 3 schematically illustrates an exemplary scheme to reassemble kappa 
light chains according the methods of the invention. J region oligos (in the center shaded 
box) are SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10; SEQ ID NO:l 1. 

V K sequences were PCR amplified with gene specific primers: 
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=> 5' oligos are designed with Xhol sites and 3' primers are designed with 
extension BsrDI sites (see scheme in Figure 3); 

=> Jk sequences are generated from oligos (see Figure 3 and SEQ ID NO:7, 
SEQ ED NO:8, SEQ ID NO:9, SEQ ID NO:10; SEQ ID NO:ll); 

=> Ck sequences are PCR amplified using oligos including a Bsal site at the 
5' end and a Xbal site at the 3 'end. 

Primers for PCR amplification of V K and C K are: 
Reverse primer V* add-on: 

CATCAT GCAATG (SEQ ID NO: 12) plus gene specific part ( the first base of 
the last codon is skipped) 

Forward primer C K 5' add-on: 

CTACTAGGTCTCAAA (SEQ ID NO:13) plus gene specific sequence. 
Reassembly of heavy chains: 

Immunoglobulin heavy chains were reassembled with four domains: 

• Vh : 57 sequences in 7 families; ~ 300 bp 

• Dr: 116 sequences (both orientations, different reading frames included); -20 bp 

• Jh : 12 sequences; ~ 60 bp 

• Ch : 1 sequence; ~ 300 bp 

->57xll6xl2xl = 79344 combinations 
Reassembly strategy: 

- PCR amplify Vh genes with gene specific primer 

- Primers include SacI site at 5' end 

- Primers include Sap I site at 3* end to generate 3bp overhangs in last codon; last 
codon is AGA for most genes (45 out of 57) 

- Vd and Vj genes are synthesized from oligos (see scheme below); first library targets 
only AGA junctions and TAC junctions (7 of 12 J's) 

- PCR amplify CH gene, including a Bsal or BsmBI site at the 5'end and a Spel site at 
the 3* end 

45x116x7x 1=36540 
Primers for PCR amplification of Vh and Ch are: 
Reverse primer V H add-on: 
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CATCATGCTCTTCA (SEQ ID NO:14) plus gene-specific part 
Forward primer Ch 5' add-on: 

CTACTAGGTCTC (SEQ ID NO:15) plus gene specific part 
Figure 4 schematically illustrates an exemplary scheme to reassemble 
antibody heavy chains according the methods of the invention. 

Example 3 : Approaches to step-wise nucleic acid reassembly: Tandem Reassembly 

The following example described an exemplary procedure of the invention. 
For example, step-wise nucleic acid reassembly (i.e., "Tandem Reassembly") can be used in 
conjunction with the nucleic acid synthesis methods of the invention. In one aspect, step- 
wise nucleic acid reassembly is used to assemble nucleic acids made by iterative assembly of 
oligonucleotide building blocks using the compositions and methods of the invention. In one 
aspect, step-wise nucleic acid reassembly is used to further modify the chimeric antibodies of 
the invention. In one aspect, the products of step-wise nucleic acid reassembly are isolated 
and/or purified using the invention's compositions and methods for purifying double- 
stranded polynucleotides lacking base pair mismatches, insertion/deletion loops and/or 
nucleotide gaps. 

This example is provided to illustrate an exemplary step- wise application of a 
reassembly nucleic acid. This step-wise approach can allow the construction of products to 
be expedited by allowing the construction of partial reassembly products (or reassembly sub- 
products or intermediate reassembly products) to occur simultaneously or in parallel, and for 
these partial reassembly products to then be assembled into final products. The following 
example illustrates this step-wise reassembly approach using 3 partial products, but in 
different aspects of this invention, different numbers of partial products can be used (e.g. 
corresponding to every integer value from 2 to one billion). In this approach, pools of 
nucleic acid fragments (or nucleic acid building blocks) containing sequences from each gene 
(or other sequence, e.g. gene pathway or regulatory motif), to be reassembled are stepwise 
ligated but not to full length. 

In this example, the assembly process was started from three positions within 
the sequences: the 5'-end, an internal position (Internal) and the 3'-end. Overhangs at the 
junction points are designed to accommodate a biotinylated hook containing appropriate 
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restriction sites (e.g. the solid phase protocol according to Dynal A.S., Oslo, Norway, see 
Biomagnetic Techniques in Molecular Biology - Technical Handbook, 3rd edition, section 
5.1 entitled: "Solid-phase gene assembly", page 135-137). 

The example illustrated in Figure 6 is for the reassembly of three esterase 
genes (a "three points ligation approach" for the reassembly of three esterase genes). After 
alignment of the three parental sequences, overhangs were designed and corresponding 
oligos were synthesized. Prior to the reassembly, analog sequences were pooled into one 
sample and 19 pools of nucleic acid building blocks were created (the 19 nucleic acid 
building blocks were named Fl to F19). Reassembly was carried out with the pools 
following standard procedures. Three sub-products were made: Fl-7, F8a-13 and F14-19. 
Assembly processes were performed either in the 5'-3' direction of the genes or, e.g. for the 
F14-19 intermediate product, in the 3' to 5* direction. 

Once the three sub-products were made using solid phase bead supports, the 
F8a-13 and F14-19 sub-products were released from the beads using shift restriction enzymes 
(see Figure 7A), e.g. Bsa I or Bsb I (other can be used as well). Fig.7A illustrates the elution 
of reassembled DNA from the solid support using alternative restriction sites engineered in 
the biotinylated hook. Eluted Fl-7 (lanes 2-3), eluted F8a-13 (lanes 4-5), and eluted F14-F19 
(lane 6). DNA ladders (lanes 1 and 7) . 

The released F8a-13 was then assembled onto the bead-attached Fl-7 sub- 
product, followed by the assembly of the F14-19 sub-product. Sub-products F 8a-13 and 
F14-19 can be added in molar excess to facilitate the generation of full-length products. 
Figure 7B shows the elution of final reassembled products. Fig. 7B illustrates the elution of 
final reassembled products from the solid support (lane 4). DNA ladders (lanes 1, 2, 3, and 
5). Thus, the intended full-length product was gel purified for cloning and library generation. 
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Example 4 : An .exemplary oligonucleotide purifying protocol: "MutS treatment" 

This example describes an exemplary oligonucleotide purifying method of the 
invention, "MutS treatment." 

Reassembly of the 1658 OT5 gene 
5 This example illustrates that the treatment of reassembly fragments (or nucleic 

acid building blocks) with a MutS protein-based filtering (or purification) step substantially 
increased the yield of intact open reading frames that resulted from the nucleic acid 
reassembly process of the invention. To demonstrate this, the gene of a fluorescent protein 
was synthesized from nucleic acid building blocks with or without prior MutS treatment. 

10 From the 732 base pair (bp) gene sequence for the fluorescent protein 1658 

OT5 suitable nucleic acid building blocks were designed and the corresponding 
oligonucleotides (22 to 59 bases in length) were synthesized chemically. 20 reassembly 
fragments were prepared by annealing of 20 forward and 20 reverse oligonucleotides. In one 
arm of the experiment, the nucleic acid building blocks (concentration 25 pmol/jil) were left 

15 untreated, and in another arm of the experiment the nucleic acid building blocks were 
subjected to the following MutS treatment protocol: 

Mut-S treatment: Fragments (1000 pmol) were added to 349 |il of a reaction 
mix (20 mM Tris/Cl pH 8.0, 90 mM KC1, 1 mM DTT, 5 mM MgCl 2 , 10 % v/v glycerol) and 
supplemented with 17.9 \il MutS (Epicentre, 2 mg/ml). The reaction mixture was incubated 

20 for 1 hour at room temperature, transferred into Microcon YM-100 (Millipore) filtration units 
and spun for 20 min at 4,700 g. The flow through was loaded onto YM-10 (Millipore) 
filtration units and concentrated by centrifugation (30 min, 13,800 g). The retentate was 
recovered and the volume was adjusted to a final oligonucleotide concentration of 
approximately 25 pmol/^l. 

25 The nucleic acid reassembly process of the invention was then continued 

using magnetic beads as solid support (the solid phase protocol used was according to Dynal 
A.S., Oslo, Norway, see Biomagnetic Techniques in Molecular Biology - Technical 
Handbook, 3 rd edition, section 5.1 entitled: "Solid-phase gene assembly", page 135-137), and 
using MutS-treated nucleic acid building blocks in one experimental arm and untreated 

30 nucleic acid building blocks in the other arm. The final nucleic acid reassembly product was 
made by step-wise cycles of assembly and washes to remove unbound fragment. The full- 
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length product was removed from the beads by restriction digestion, amplified by PCR, 
cloned into a suitable vector and transformed into E. coli. To investigate the influence of the 
MutS treatment, 20 clones from each reassembly reaction arm were randomly picked, the 
respective plasmids isolated and the integrity of the inserted open reading frame checked by 
5 sequencing. 

Results: Sequence comparison revealed that the MutS treatment increased the 
yield of correct open reading frames for the gene 1658 OT5 substantially. 

Example 5 : Gene Reassembly 

The following example describes manipulation of three related parental 

10 nucleotide sequences using gene reassembly. Each of the three related parental nucleotide 
sequence was aligned in the computer to determine demarcation points, and 17 such points 
were identified. Once each demarcation point was determined, the system determined the 
sequence of the 1 8 different fragments that would make up each parental gene. Each 
fragment from the parental sequence had a unique 5' and 3' overhang so only genes in the 

15 proper order could be reassembled by the computer. Because there were 1 8 fragments and 
three parents, the system had a total of 18 X 3 = 54 total fragments to analyze. It is 
advantageous for the system to pre-ligate each of the fragments in a process in order to store 
datafiles corresponding to every possible combination of pre-ligated fragments. This allows 
the system to determine the proper quantities of each pre-ligated fragment at each step in the 

20 ligation reaction in order to generate a resulting progeny population that has a predetermined 
PDF. Thus, in this example, the computer determined and stored the following pre-ligated 
sequences into its memory for EACH parent sequence. Accordingly, the following pre- 
ligation method is carried out on each parent sequence, the resulting data is stored to the 
computer. 

25 The nomenclature "Fl_l" refers to the first fragment from the chosen parental 

sequence. The nomenclature "Fl_5" corresponds, as shown below, to a dataset comprising a 
combination of the first, second, third, fourth and fifth fragments of the chosen parental 
sequence. Thus, the following listing illustrates that the system can generate a dataset that 
stores every possible pre-ligated fragment for a given parent. This dataset is then used by the 

30 system to determine the proper quantities of each pre-ligated fragment to result in the desired 
final crossover population of progeny chimeric sequences. 
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Listing of Pre-Ligation Dataset for a Parent Sequence having 18 fragments. 

Fl 1 -F1J 

Fl 2 = F1_1 +F2_2 

Fr3 = Fl_l +F2 2 + F3 3 
5 Fl~4 - Fl_l + F2_2 + F3_3 + F4_4 

F1_5 = F1 I + F2_2 + F3_3 + F4_4 + F5_5 

Fl 6-F1 I +F2 2 + F3 3 + F4 4 + F5 5 + F6 6 

Fl 7 = Fl_l + F2_2 + F3_3 + F4_4 + F5_5 + F6_6 + F7_7 

FO«Fl 1+F2 2 + F3 3 + F4 4 + F5 5 + F6 6 + F7_7 + F8_8 
1 0 Fl~9= Fl~l + F2~2 + F3~3 + F4^4 + F5_5 + F6_6 + F7_7 + F8_8 + F9_9 

Fl~10 = FT_1 + F2_2 + F3_3 + F4 4 + F5_5 + F6 6 + F7_7 + F8 8 + F9 9 + F10J0 

FMI-F1 1+F2 2 + F3 3 + F4jl + F5 5 + F6JS + F7 7+ F8_8 + F9 9 + F10J0 + FU il 

F1J2 = Fill + F2~2 + F3^3 + F4_4 + F5 5 + F6 6 + F7J7 + F8 8 + F9 9 + F10J0 + F11 11+F12 12 

Fl 13 = F1 1+F2_2+F3 3 + F4_4 + F5l5 + F6l6 + F7_7 + F8_8 + F9_9 + F10_10 + Fll_ll + F12J2 + F13J3 
1 5 F1J4 - F1J + F2_2 + F3_3 + F4_4 + F5_5 + F6_6 + F7_7 + F8_8 + F9_9 + FHMO + Fl i J 1 + F12J2 + F13J3 + 

F14 14 

F1J5 =» Fl 1 + F2_2 + F3 3 + F4 4 + F5_5 + F6J& + F7_7 + F8_8 + F9_9 + F10_10 + Fl 1_1 1 + F12J2 + F13_13 + 
F14_14 + F15_15 

Fl 16 = Fl_l + F2_2 + F3_3 + F4 4 + F5 5 + F6_6 + F7J7 + F8J + F9_9 + F10_10 + Fl l_l 1 + F12J2 + F13_13 + 
20 F14 14 + F15 15 + F16J6 

F1J7 = Fl_l~+ F2 2 + F3_3 + F4 4 + FSJS + F6_6 + F7_7 + F8_8 + F9_9 + F10J0 + Fl 1_1 1 + F12J2 + F13_13 + 
F14_14 + F15_15 + F16 16 + F17J7 

FIJ8 = Fl_l + F2_2 + F3 3 + F4 4 + F5_5 + F6.6 + F7 7 + F8_8 + F9.9 + F10J0 + Fl 1 J I + F12J2 + F13_13 + 

F14 14 + F15J5 + F16J6 + F17J7 + F18J8 
25 F2_2 = F2_2 

F2 3 = F2 2 + F3 3 

F2_4 « F2_2 + F3_3 + F4_4 

F2 5 = F2 2 + F3 3 + F4_4 + F5 5 

F2~6 » F2^2 + F3J + F4_4 + F5_5 + F6_6 
30 F2_7 = F2_2 + F3_3 + F4 4 + F5 5 + F6 6 + F7J7 

F2_8 = F2_2 + F3_3 + F4_4 + F5_5 + F6_6 + F7 7 + F8_8 

F2_9 - F2 2 + F3_3 + F4_4 + F5 5 + F6 6 + F77 + F8_8 + F9_9 

F2_10«F2_2 + F3_3 + F4_4 + F5_5 + F6__6 + F7 7 + F8_8 + F9 9 + F10JO 

F2 11=F2_2 + F3_3 + F4 4 + F5_5 + F6_6 + F7_7 + F8_8 + F9__9 + F10_10 + Fll_ll 
35 F2J2 - F2_2 + F3_3 + F4~4 + F5_5 + F6 6 + F7_7 + F8 8 + F9_9 + F10_10 + Fl 1 J 1 + F12J2 

F2 13~F2 2 + F3_3+F4 4 + F5 5+F6~6 + F7 7 + F8 8 + F9 9 + F10 10 + F11_11 +F12 12 + F13 13 

F2""l4 = F2l2 + F3 - 3+F4l4 + F5l5 + F6l6 + F7_7 + F8~8 + F9_9 + F10 10 + Fll 11+F12 12 + F13_13 + F14J4 

F2_15»F2 2 + F3 3+F4 4 + F5_5 + F6_6 + F7_7 + F8J + F9_9 + FlO_10 + FllJl + F12_12 + F13_13 + F14J4 + 

F15_15 

40 F2 16-F2 2 + F3_3+F4 4 + F5 - 5 + F6_6 + F7J7 + F8_8 + F9J + F10J0 + Fll_ll +F12J2 + F13J3 + F14J4 + 
F15 15+FT6J6 

F2 17-F2 2 + F3_3+F4 4 + F5_5 + F6_6 + F7J7 + F8_8 + F9_9 + F10J0 + FI1_U + F12J2 + F13J3 + F14J4 + 
F15J5+FT6J6 + F17J7 

F2 18«F2_2 + F3 3+F44 + F5 5 + F6_6 + F7J7 + F8_8 + F9_9 + F10_10 + FUJ1 +F12J2 + F13J3 + F14 - 14 + 
45 Fl? 15 + F16J6 +"F17_17 + F18J8 

F3 3 = F3_3 

F3_4 = F3_3 + F4_4 

F3_5«F3 3 + F4 4 + F5 5 

F3 6-F3 3 + F4l4 + F5_5 + F6 6 
50 F3l7»F33 + F4 4 + F5 5 + F6 6 + F7 7 

F3 8-F3 3 + F4_4 + F5_5 + F6_6 + F7 - 7 + F8_8 

F33 = F3 _ 3 + F4 4 + F5_5 + F6 6 + F7 7 + F8 8 + F9_9 

F3 10°F3 3 + F4_4 + F5_5 + F6_6 + F7_7 + F8_8 + F9_9 + F10_10 

F3J1«F33 + F4 4 + F5_5 + F6_6 + F7 7 + F8 8 + F9_9 + F10_10 + F11J 1 
55 F3 12«F3 3 + F4"4 + F5_5 + F6_6 + F7_7 + F8_8 + F9_9 + FI0 10 + FUJ1 +F12J2 

F3~13«F3~3 + F4j* + F5_5 + F6 6 + F7_7 + F8 8 + F9 9 + F10J0 + FU 11 + F12J2 + F13J3 

F3~14-F3 3 + F4 4 + F5_5 + F6l6 + F7^7 + F8l8 + F9l9 + F10_l0 + FU II + F12J2 + F13J3 + F14J4 

F3"l5«F3~3 + F4 4 + F5_5 + F6 6 + F7 7 + F8_8 + F9 9 + F10JO + FUJI + F12J2 + F13J3 + F14J4 + F15_15 

F3"l6 - F3"3+F4_4 + F5_5 + F6JS + F7l7 + F8_8 + F9~9 + F 10.10 + Fl I J 1 + F12_12 + F13J3 + F14J4 + F15_15 + 
60 F16 16 

F3 \7 = F3 3 + F4_4 + F5_5 + F6_6 + F7_7 + F8.8 + F9.9 + F10J0 + Fl 1 J 1 + F12_12 + F13 J3 + F14J4 + F15_15 + 
F16J6 + FT7 17 
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F3J8«F3_3+F4_4 + F5_5+F6 6 + F7 7 + F8 8 + F9 9+F10 10+FI1J1 + F12 12 + F13 13 + FI4 14 + F15J5 + 

F16J6 + F17J7 + F18 18 

F4_4 = F4_4 

F4 5-F4_4 + F5_5 
5 F4_6«F4 4 + F5_5 + F6_6 

F4 7 = F4 4 + F5_5 + F6_6 + F7 7 

F4_8=F4~4 + F5_5 + F6_6 + F7 7 + F8_8 

F4_9 = F4 4 + F5_5 + F6_6 + F7 7 + F8_8 + F9_9 

F4 10-F4 4 + F5_5 + F6_6 + F7 7 + F8 8 + F9 9 + F10 10 
10 F4J1»F4_4 + F5_5 + F6 6 + F7 7 + F8 8+F9 9 + F10 10 + F11JI 

F4 12 = F4_4 + F5_5 + F6_6 + F7_7 + F8_8 + F9_9 + F10J0 + Fl 11 1 + F12_12 

F4 13-F4 4 + F5_5 + F6 6 + F7 7 + F8_8 + F9_9 + F10J0 + F11 11 + F12_12 + FI3 13 • 

F4J4 = F4~4 + F5_5 + F6_6 + fT7 + F8l8 + F9^9 + FIOJO + Fl 11 1 + F12J2 + F13 J3 + F14J4 

F4 15 = F4jl + F5_5 + F6_6 + F7_7 + F8 8 + F9 9 + F10J0 + F11 11+F12 12 + F13 13 + F14 14 + F15 15 
15 F4J6 - F4_4 + F5__5 + F6JS + F7_7 + F8~8 + F9_9 + FIOJO + Fl 1J 1 + F12_12 + F13 J3 + F14J4 + F15J5 + 

F16J6 

F4_l 7 = F4_4 + F5J + F6_6 + F7_7 + F8_8 + F9_9 + F10_10 + Fl 1 J 1 + F12 J2 + F13 J3 + F14J4 + F15 J 5 
+ F16J6 + F17J7 

F4J8 = F4_4 + F5_5 + F6_6 + F7 7 + F8 8 + F9 9 + F10 10 + F11 11+F12J2 + F13 13 + F14 14 + F15 15 + 
20 F16J6 + F17 17 + F18 18 

F5_5 = F5_5 

F5_6 = F5_5 + F6_6 

F5_7 = F5_5 + F6_6 + F7_7 

F5 8 = F5_5 + F6_6 + F7_7 + F8_8 
25 F5~9«F5 5 + F6_6 + F7_7 + F8_8 + F9_9 

F5_10 - F5 5 + F6_6 + F7J7 + F8_8 + F9_9 + FIOJO 

F5 J 1 = F5~5 + F6_6 + F7_7 + F8 8 + F9_9 + F10_10 + Fl 11 1 

F5 12 = F5 5+F6_6 + F7_7 + F8_8 + F9_9 + F10J0 + FUJl + F12J2 

F5J3 « F5l5 + F6JS + F7_7 + F8_8 + F9_9 + F10 10 + Fl i_l 1 + F12J2 + F13_13 
.30 F5 14-F5 5 + F6_6 + F7 7 + F8_8 + F9 9 + F10J0 + F11 1 1 +F12_12 + F13 13 + F14J4 

F5J5 - F5l5 + F6_6 + F7J7 + F8_8 + F9^9 + F10_10 + Fl 1 J 1 + F12J2 + F13_13 + F14J4 + F15 15 

F5 16 = F5_5 + F6_6 + F7 7 + F8_8 + F9_9 + F10 10 + Fll U + F12 12 + F13J3 + F14 14 + F15J5 + F16J6 

F5J7 = F5_5+F6_6 + F7l7 + F8_8 + F9_9 + F10J0 + FllJl+F12J2 + F13J3 + F14J4 + F15J5 + F16J6 + 

F17_17 

35 F5J8 = F5_5 + F6JS + F1J + F8_8 + F9_9 + FIOJO + Fl 11 1 + F12_12 + F13_13 + F14J4 + F15J5 + F16J6 + 

F17J7 + F18J8 

F6 6 = F6 6 

F6J7 = F6_6 + F7J7 

F6_8 = F6 6 + F7_7 + F8_8 
40 F6_9 = F6_6 + F7_7 + F8_8 + F9_9 

F6J0«F6_6 + F7 7 + F8_8+F9 9 + F10 10 

F6J1»F6_6 + F7J7 + F8_8 + F9 9 + F10J0 + FU 

F6 12-F6 6 + F7 7 + F8_8 + F9 9 + F10J0 + F11 

F6J3 « F6J5 + F7_7 + F8_8 + F9_9 + F10_10 + Fl 1 
45 F6 14 = F6 6 + F7_7 + F8_8 + F9_9 + FIOJO + FU 

F6ll 5 « F6JS + F7_7 + F8_8 + F9_9 + FIOJO + Fl 1 

F6 16 = F6_6 + F7_7 + F8_8 + F9 9 + F10*"lO + Fll 

F6~17 = F6_6 + F7_7 + F8_8 + F9~9 + F10*"lO + Fl 1 

F6J8 = F6 6 + F7J7 + F8 8 + F9~9 + F10_10 4- FH 
50 F18J8 

F7 7 = F7 7 

F7 8 = F7_7 + F8_8 

F7_9 = F7 7 + F8_8 + F9_9 

F7 10»F7 7 + F8 8 + F9 9 + F10J0 
55 F7J 1 « F7~7 + + F9"*9 + FIOJO + Fl 1J 1 

F7 12 = F7~7 + F8_8.+ F9 9 + F10J0 + Fll Jl + F12J2 

F7J3 = F7J7 + F8 8 + F9^9 + FlO"lO+ Fll^l 1 + F12J2 + F13 13 

F7 14 = F7_7 + F8~8 + F9_9 + FIOJO + Fl 1 J 1 + F12J2 + F13 J3 + F14J4 

F7~15-F7 7 + F8l8 + F9_9 + F10JO + Fll 11+F12J2 + F13 13 + F14 14 + F15J5 
60 F7J6 = F7~7 + F8_8 + F9_9 + F10 10 + Fll 11+F12J2 + FI3 13 + F14 14 + F15J5 + F16J6 

F7J7 » F7^7 + F8_8 + F9_9 + FIOJO + Fl 1 J 1 + F12J2 + F13J3 + F14J4 + F15J5 + F16 J 6 + F17J7 

100 



11 

11+F12 12 

11+F12J2 + F13J3 

11+F12J2 + F13 13 + F14J4 

11+F12 12 + F13 13 + F14 14+F15J5 

1 1 + F12J2 + F13J3 + F14J4 + F15J5 + F16J6 

11+F12 12 + F13 13 + F14 14 + F15J5 + F16J6 + F17 17 

11 + F12J2 + F13 J3 + F14J4 + F15 J5 + F16J6 + F17J7 + 
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F7 18 = F7_7 + F$.J + F9_9 + FIOJO + F11 U+F12 12 + F13_13 + F14J4 + F15_15 + F16_16 + F17J7 + F18.J8 
F8 8»F8 8 
F8_9 = F8_8 + F9_9 
F8 10«F8 8 + F9_9 + F10_10 
5 F8 11=F8~8 + F9_9 + F10_10 + F1LU 

F8 12 = F8_8 + F9_9 + F10J0 + Fl 1_1 1 + F12_12 
F8J3 = F8 8 + F9_9 + Fl0J0 + FllJl +F12_12 + F13_13 . 
F8 14-F8 8 + F9_9 + Fl0_10+Fll 11 +F12_12 + F13 13 + F14_14 
F8J5 » F8_8 + F9_9 + F10J0 + Fl M 1 + F12_12 + F13 J3 + F14J4 + F15 J5 
10 F8 16 = F8_8 + F9_9 + F10 10+F11 11 +F12_12 + F13 13 + F14_14 + F15 15 + F16_16 

F8J7 = F8_8 + F9_9 + F10J0 + Fl Ml + F12_12 + F13J3 + F14J4 + F15_15 + F16 16 + F17J7 

F8 18-F8 8 + F9_9 + F10_10 + Fll_U +F12_12 + F13 13 + F14J4 + F15 15 + F16~*16 + F17 17 + F18_18 

F9_10 = F9 9 + FlO_10 
1 5 F9J 1 - F9_9 + F10_10 + Fl 1 J 1 

F9 12 = F9 9 + F10_10 + Fll 11+F12 12 

F9J3 - F9_9 + F10_10 + Fl 1 J 1 + F12J2 + F13J3 

F9J4 = F9 9 + F10 10 + FU 11+F12 12 + F13_13 + F14 14 

F9 15=*F9_9 + F10 10 + FU_11 + F12J2 + F13J3 + F14J4 + F15J5 
20 F9J6 « F9_9 + F1OJ0 + Fl 1_1 1 + F12 12 + F13 _13 + F14J4 + F15J5 + F16 16 

F9 17«F9 9 + F10J0 + FU U+F12 12 + F13J3 + F14J4 + F15 15 + F16 16 + F17_17 

F9J8 - F9_9 + F10J0 + Fl 1_1 1 + FI2_12 + F13 J3 + F14J4 + F15J5 + F16J6 + F17J7 + F18J8 

F10J0-F10 10 

F10Jt«F10_10+FU_ll 
25 F10J2 = F10 10 + FU 11+F12 12 

F10J3 « F10J0 + Fl 1_1 1 + F12J2 + F13J3 

F10J4«F10_10 + F11 11+F12J2 + F13 13 + F14_14 

F10~15 = F10 10 + Fll_ll+F12 12 + F13J3 + F14 14 + F15 15 

F10J6 » F10_10 + Fl 11 1 + F12J2 + F13J3 + F14_14 + F15 J5 + F16J6 
30 F10 17 = F10_10 + FU 11 + F12J2 + F13 13 + F14 14 + F15 15 + F16J6 + F17J7 

F10Tl8 - F10J0 + Fl 1_1 1 + F12J2 + F13 J3 + F14_14 + F15_15 + F16J6 + F17J7 + F18_I8 

Fll 11=F11 11 

FU_12 = F11J1+F12_12 

F11_13=F11 11 +F12J2 + F13 13 
35 Fl 1_14 - Fl M 1 + F12J2 + FI3J3 + F14J4 

F1IJ5 = F11_11+F12_12 + F13 13 + F14 14 + F15_15 

Fl06 = FlCll + F12 12 + F13 13 + F14J4 + F15_15 +F16J6 

Fll 17 = FUJI + F12J2 + F13J3 + F14 14 + F15_15 +F16 16 + F17J7 

Fll 18 = F11 11+F12_12 + F13_13 + F14 14 + F15_15 + F16J6 + F17J7 + F18_18 
40 F12 12-F12 12 

F12 13 = F12 12 + F13 13 

F12J4»F12J2 + F13_13 + F14 14 

F12 15-F12 12 + F13 13 + F14 14+F15 15 

Ft2J6 - F12_12 + F13 J3 + F14 14 + F15_15 + F16_16 
45 F12 17«F12_12 + F13 13 + F14 14 + F15 15 + F16 16 + F17 17 

F12 I8«F12 12 + F13 13 + F14J4 + FI5 15 + F16 16 + F17 17 + F18 18 

F13J3 = F13_13 

F13 14-FI3 13 + F14 14 

F13J5 -F13J3 + F14_14 + F15J5 
50 F13 16-F13 13 + F14 14 + F15 15 + F16J6 

F13 17»F13 !3 + F14_14 + F15_15 + F16j6 + F17J7 

F13_18 = F13 13 + F14 14 + F15 15 + F16 16 + F17 17 + F18 18 

F14J4-FI4J4 

F14 15»F14 14 + F15J5 
55 F14J6 = F14J4 + F15_15 + F16J6 

FI4 17-F14 14 + F15_15 + F16 16 + F17 17 

F14J8 - F14J4 + F15_15 + F16J6 + F17J7 + F18_18 

F15 15 = F15 15 

F15 16-F15 15 + F16J6 
60 F15_17 = F15J5.+ F16_16 + F17 17 

F15 18-F15 15 + F16 16 + F17 17 + F18 18 

F16J6-F16 16 

F16_17«F16_16 + F17_17 
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' * • 

F16 18 = F16_16 + F17 17 + F18_18 

F17_17 = F17_I7 ' 

F17 18 = F17_17 + F18__18 

F18J8-F18J8 

Once the sequence of each pre-ligated fragment is determined, the system 
begins to estimate the portions of each pre-ligated sequence to be used to generate the desired 
PDF. As discussed above, the ligation reaction for a sequence having 18 fragments 
preferably takes place as 18 separate reactions. Thus, the system generates a starting set of 
ligation reactions for each of the 18 separate ligations. It should be noted that each ligation 
step uses progressively fewer of the pre-ligated molecules. This is due to the fact that, for 
example, the third step of the ligation reaction would not require pre-ligated fragments 
starting with fragment 1 "Fl" or fragment 2 (F2) since these fragments have already been 
ligated to other fragments by the third step in the ligation. At step three, there should only 
ligation of fragments that bind to the third fragment from each parent. 

For example, the following are exemplary ligation reactions that take place 
within the memory of the computer system. 

Number of Ligation Steps: 18 
Simulated Ligation volume of each step (ul): 100 



Ligation Step 


Ligation Step 


Ligation Step 


Ligation Step 


Ligation Step 


#1 


#2 


#3 


#4 


#5 


0.6ulofFl_l 


0.7ulofF2_2 


0.7ulofF3J3 


0.8 ul of F4_4 


1.0ulofF5_5 


1.2ulofFl_2 


1.3ulofF2_3 


1.5ulofF3_4 


1.7ulofF4_5 


1.9ulofF5_6 


1.8 ulofFl_3 


2.0ulofF2_4 


2.2ulofF3_5 


2.5ulofF4_6 


2.9ulofF5_7 


2.3ulofFl_4 


2.6 ul of F2_5 


2.9ulofF3_6 


3.3ulofF4J7 


3.8ulofF5_8 


2.9ulofFl_5 


3.3 ul of F2_6 


3.7ulofF3_7 


4.2 ulofF4_8 


4.8ulofF5_9 


3.5ulofFl_6 


3.9ulofF2J7 


4.4ulofF3_8 


5.0ulofF4_9 


5.7ulofF5_10 


4.1ulofFl_7 


4.6ulofF2_8 


5.1 ulofF3_9 


5.8ulofF4_10 


6.7ulofF5_ll 


4.7 ul of Fl J 


5.2ulofF2_9 


5.9ulofF3_10 


6.7 ul ofF4_ll 


7.6ulofF5_12 


5.3 ul of Fl_9 


5.9ulofF2_10 


6.6 ul ofF3_ll- 


7.5ulofF4_12 


8.6ulofF5_13 


5.8ulofFlJ0 


6.5ulofF2_ll 


7.4ulofF3_12 


8.3ulofF4_13 


9.5ulofF5_14 


6.4 ul of FIJI 


7.2 ul of F2_12 


8.1ulofF3J3 


9.2ulofF4_14 


10.5ulofF5_15 


7.0ulofFlJ2 


7.8ulofF2_13 


8.8ulofF3_14 


10.0ulofF4_15 


11.4ulofF5_16 


7.6ulofFl_13 


8.5 ul of F2_14 


9.6ulofF3_15 


10.8 ul of F4J6 


12.4ulofF5_17 


8.2ulofFl_14 


9.2 ul ofF2_15 


10.3ulofF3_16 


11.7ulofF4_17 


13.3ulofF5_18 
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8.8ulofFl_15.. . 


9.8ulofF2_16 


11.0ulofF3_17 


12.5 ulofF4_18 




9.4ulofFl_16 


10.5 ul of F2_17 


11.8 ulofF3_18 






9.9ulofFl_17 


11.1 ulofF2J8 








10.5ulofFl_18 










Ligation Step 


Ligation Step 


Ligation Step 


Ligation Step 


Ligation Step 


#6: 


#7 


#8 


#9 


#10 


1.1 ul ofF6_6 


1.3ulofF7J7 


1.5ulofF8_8 


1.8ulofF9_9 


2.2ulofF10_10 


2.2 ul of F6J7 


2.6ulofF7_8 


3.0ulofF8_9 


3.6 ul of F9_10 


4.4ulofF10 11 


3.3ulofF6_8 


3.8ulofF7_9 


4.5 ul of F8_10 


5.5ulofF9_ll 


6.7ulofF10 12 


4.4ulofF6_9 


5.1ulofF7_10 


6.1ulofF8_ll 


7.3ulofF9_12 


8.9ulofF10 13 


5.5 ul of F6_10 


6.4ulofF7_ll 


7.6 ul ofF8_12 


9.1ulofF9 13 


111 ulofFlO 14 

X X . X Ui \JX X k\J X ~ 


6.6 ul of F6_ll 


7.7 ul of F7_12 


9.1 ulofF8_13 


10.9ulofF9 14 


13 3ulofF10 15 


7.7 ul of F6J2 


9.0 ul ofF7_13 


10.6ulofF8_14 


12.7ulofF9_15 


15.6ulofF10_16 


8.8uiofF6_i3 


10.3ulofF7J4 


12.1ulofF8_15 


14.5 ul of F9J6 


17.8 ulofF10_17 


9.9ulofF6_14 


11.5ulofF7_15 


1 3.6ulofF8_16 


16.4 ul of F9_17 


20.0 ul ofF10_18 


11.0ulofF6_15 


12.8ulofF7_16 


15.2 ul ofF8_17 


18.2ulofF9J8 




12.1ulofF6_16 


14.1ulofF7_l7 


16.7ulofF8_18 






13.2ulofF6_17 


15.4 ul of F7_18 








14.3ulofF6_18 










Ligation Step 


Ligation Step 


Ligation Step 


Ligation Step 


Ligation Step 


#11 


#12 


#13 


#14 


#15 


2.8ulofFll_U 


3.6ulofF12_12 


4.8ulofF13_13 


6.7ulofF14_14 




5.6ulofFll_12 


7.1ulofF12_13 


9.5ulofF13_14 


13.3ulofF14_15 


10.0ulofF15_15 


8.3ulofFll_13 


10.7ulofF12_14 


14.3ulofF13J5 


20.0ulofF14_16 


20.0ulofF15_16 


ll.lulofFll_14 


14.3 ul of F12_15 


19.0ulofF13_16 


26.7 ul of F14 _17 


30.0ulofF15_17 


13.9ulofFll_15 


17.9 ul ofF12_16 


23.8ulofF13_17 


33.3ulofF14_18 


40.0ulofF15_18 


16.7 ul of Fl 1^16 


21.4ulofF12_17 


28.6ulofF13_18 






19.4ulofFll_17 


25.0ulofF12J8 








22.2 ul of Fll_18 










Ligation Step 


Ligation Step 


Ligation Step 






#16 


#17 


#18 






16.7ulofF16_16 


33.3ulofF17_17 


100.0 ul of 






33.3 ul ofF16_17 


66.7ulofF17_18 


F18.J8 






50.0ulofF16_18 
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.. Carrying out the preceding ligation reactions results in a calculated PDF. 
Thus, the system can then adjust the volumes of each pre-ligated fragment during a further 
round of simulated reassembly until the PDF matches the desired probability function. The 
majority of progeny molecules only have one or two crossover events. Adjusting the 
quantities of the ligation reactions, as shown below will skew the PDF so that it moves 
towards progeny molecules having more crossover events. 
Computer Systems: 

The methods of the invention, particular, the gene reassembly aspects of the 
invention, can use computer systems to carry out the methods described herein. In one 
aspect, the computer system is a conventional personal computer such as those based on. an 
Intel microprocessor and running a Windows operating system. The output of the computer 
system is a fragment PDF that can be used as a recipe for producing reassembled progeny 
genes, and the estimated crossover PDF of those genes. The processing described herein can 
be performed by a personal computer using the MATLAB™ programming language and 
development environment. The invention is not limited to any particular hardware or 
software configuration. For example, computers based on other well-known microprocessors 
and running operating system software such as UNIX™, Linux, MacOS™ and others are 
contemplated. 

Figure 8 illustrates an exemplary software program used in the methods of the 
invention. This "GENECARPENTER™" software program can be used as gene reassembly 
control software, and particularly in the methods of the invention for designing and making 
polynucleotides by iterative assembly of codon building blocks. 

Example 6 : Iterative or combinatorial approach 

In various aspects, this invention incorporates methods comprising 
introducing point mutations or codon mutations (e.g. by GSSM, where all possible amino 
acid substitutions are introduced at each position) followed by selection &/or screening, in 
combination with chimerization among selected products (e.g. positive hits) and/or parental 
sequences, and optionally repeating with one or more selection &/or screening step(s), and 
optionally one or more mutagenesis step(s). The screening or selection criteria according to 
this invention can include increases or decreases in one or more of the following: 
thermotolerance, ability to renature after denaturation by, e.g. heat (e.g. as determined with 
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the helpd of abomb calorimeter), storage life (e.g. shelf life at various temperatures), 
bioavailability, expression level, resistance to digestive tract destruction or to protease- 
mediated degredation, and activity &/or stability under different environmental conditions 
(e.g. exposure to different pH, pressure, salinity, solvent, etc. conditions). 

Evolution by the GSSM™ method. The GSSM™ method was used to create a 
comprehensive library of point mutations in gene BD7746. A screen for thermotolerance 
was developed which measures the residual activity of an enzyme after heat challenge at high 
temperature. GSSM combined with a xylanase thermotolerance screen identified nine unique 
point mutants that had improved thermal tolerance. All nine mutations were combined in 
one gene using site-directed mutagenesis to generate a 9X mutant enzyme. 

Generation of combinatorial GSSM™ variants using gene reassembly 
technology. To identify variants of the 9 point mutations with highest thermal tolerance and 
activity compared to the 9X variant, a Gene Reassembly library of all possible mutant 
combinations (I 9 ) was constructed and screened. Using thermostability as the criterion, 33 
unique combinations of the nine mutations were identified as up-mutants. A secondary 
screen was performed to select for variants with higher activity/expression than the evolved 
9X. This screen yielded 10 variants with sequences possessing between 6 and 8 mutations in 
various combinations. All 10 variants have higher thermotolerance and improved activity 
over the 9X variant. These enzymes were subsequently purified and characterized. 

Detailed protocols: 

Gene Site Saturation Mutagenesis and Activity Screening of BD7746. The 
BD7746 gene was amplified by PCR and cloned into the expression vector pTrcHis2 using 
the pTrcHis2 TOPO™ TA Cloning® Kit (Invitrogen, Carlsbad, CA). GSSM was performed 
as described previously (Short, JM 2001) using 64-fold degenerate oligonucleotides to 
randomize at each codon in the gene so that all possible amino acids would be encoded. The 
resultant GSSM library was transformed into XLl-Blue (Stratagene, La Jolla, CA) for 
screening. 

Individual clones were arrayed in 96-well microtiter plates containing 200 \iL 
of LB media and 100 jig/mL ampicillin using an automated colony picker (AutoGen, Ma). 
Four 96-well plates were screened per codon. The plates were incubated overnight at 37 °C. 
These master plates were replicated using a 96-well pintool into fresh media containing 
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antibiotic. The replica plates were sealed with a gas permeable adhesive film arid incubated 
overnight at 37 °C. After incubation, the seals were removed and the plates centrifuged at 
approximately 3000g for 10 minutes. The supernatant was removed and the cells 
resuspended in 45 of 100 mM citrate/phosphate buffer (pH 6.0) containing 100 mM KC1 
(CP buffer). The plates were then covered with an adhesive aluminum seal and incubated at 
80 °C for 20 minutes followed by the addition of 30 ^L of 2 % Azo-xylan prepared in CP 
buffer and incubation overnight at 37 °C. After incubation, 200 \xL of 100 % ethanol was 
added and the plates were centrifuged at approximately 3000g for 10 minutes. The 
supernatant was transferred to fresh plates and absorbance at 590 nm measured to quantify 
residual enzyme activity. 

All nine mutations were combined in one gene using site-directed mutagenesis 
to generate a 9X mutant enzyme. The 9X gene, the wild-type gene and all nine single mutant 
genes were PCR amplified using primers designed to append an N-terminal hexahistidine 
tag. The PCR products were cloned into pTrcHis2 as described above. 

GeneReassembly™ library construction and screening. The 591 bp XYL7746 
gene (gene plus codons for hexahistidine tag) was divided into 5 segments according to the 
locations of the mutations in the GSSM clones. In this scenario, segments 1 and 3 
corresponded to the wild-type gene while segments 2 and 4 contained 0-4 amino acid 
mutations each and segment 5 contained 0-1 mutations. Three of the segments, 1, 3 and 5 
were produced by PCR where segments 1 and 3 used the wild-type template and segment 5 
was made using two different templates (wild type and mutant S79P). Segments 2 and 4 were 
both made by annealing synthetic oligonucleotide containing 0-4 mutations each. After all 
the segments were made the library was constructed by first digesting the PCR products of 
segments 1, 3 and 5 to create overhangs compatible with those of the annealed oligomers 2 
and 4. Segments 1-3 and 4-5 were ligated separately. The ligated 1-3 segment was amplified 
by PCR and the product was digested and ligated to segment 4-5. The final library (512 
mutants; segments 1-5) was isolated and cloned into pTrcHis2 and transformed into XL1 
Blue MRF cells (Stratagene, La Jolla, CA) and was plated on solid LB medium containing 
100 ng/mL ampicillin. Approximately 4000 colonies were auto-picked (see above) into 
approximately forty 96-well plates and were incubated at 37 °C overnight. The screening 
assay was performed as described above for the screening of the GSSM™ mutant library 
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except that the resuspended cells were incubated for 60 minutes at 80 °C followed by 
addition of substrate and incubation of plates at 37 °C for 20 minutes. 

A number of embodiments of the invention have been described. 
5 Nevertheless, it will be understood that various modifications may be made without 

departing from the spirit and scope of the invention. Accordingly, other embodiments are 
within the scope of the following claims. 
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